ISSN  0316-6295 


Theory  of  Database  Mappings 
by 

Anthony  C.  Klug 
Technical  Report  CSRG-98 
December  1978 


COMPUTER  SYSTEMS  RESEARCH  GROUP 

UNIVERSITY  OF  TORONTO 


fahoa l c 


Theory  of  Database  Mappings 
by 

Anthony  C.  Klug 
Technical  Report  CSRG-98 
December  1978 


The  Computer  Systems  Research  Group  (CSRG)  is  an  interdisciplinary 
group  formed  to  conduct  research  and  development  relevant  to  computer 
systems  and  their  application.  It  is  jointly  administered  by  the  Depart¬ 
ment  of  Electrical  Engineering  and  the  Department  of  Computer  Science 
of  the  University  of  Toronto,  and  is  supported  in  part  by  the  National 
Research  Council  of  Canada. 


Abstract 

The  idea  of  a  mapping  is  important  within  database  management; 
most  state-of-the-art  database  management  facilities  involve 
mappings.  In  view  of  this  prominent  role  played  by  mapping 
concepts,  it  is  desirable  to  have  a  sound  theoretical  foundation 
from  which  to  base  further  research  in  advanced  database 
management  facilities..  This  thesis  contributes  to  the 
development  of  such  a  fourdatjon. 

Mathematically  precise  definitions  of  structure  and  operation 
mappings  are  given..  In  this  formalism,  data  models  and  mappings 
resemble  abstract  machines  and  their  mappings..  Desirable 
structure  mappings  are  defined  to  be  the  ones  which  produce 
consistent  views  from  consistent  base  schemas..  Several  classes 
cf  desirable  operation  mappings  are  defined.  One  classification 
specifies  the  conditions  under  which  an  operation  is  correctly 
interpreted,  and  another  specifies  when  consistent  states  result 
from  the  operation  mapping.  VJe  also  compare  our  formalism  with 
two  others  from  the  literature,  and  we  compare  our  formalism  and 
predicate  calculus. 

Having  defined  the  desirable  properties  of  mappings,  we 
endeavor  to  give  necessary  and  sufficient  conditions  for  these 
properties  to  hold  within  a  relational  setting.  The  relational 
setting  contains  functional  dependencies,  equality  constraints 
and  relational  algebra.  Determining  desirable  structure  mappings 
is  shown  to  be  equivalent  to  generating  all  valid  constraints  on 
relational  algebra  expressions.  This  problem  is  undecidable  when 
set  difference  is  among  the  relational  algebra  operators.  By 
removing  this  operator,  we  have  a  decidable  problem,  and 
procedures  are  given  for  determining  the  "good"  structure 
mappings. 

One  class  of  relational  operation  mappings  is  studied  in 
detail.  Recognizing  this  class  of  mappings  is  undecidable  in 
general,  but  a  restriction  analogous  to  the  one  above  produces  a 
decidable  problem,  and  necessary  and  sufficient  conditions  are 
given  for  this  restricted  problem. 

Finally,  we  remark  on  how  the  conditions  given  for  determining 
the  various  types  of  mappings  within  a  relational  context  can  be 
used  within  the  contexts  of  other  data  models. 


Ackno  wledqe  men ts 


It  is  with  pleasure  that  I  now  express  my  appreciation  to  the 
following  people  and  groups  for  their  many  contributions  to  the 
completion  of  this  thesis: 

To  my  supervisor  Dennis  Tsichritzis  for  his  initial 
suggestions  and  prodding,  for  his  unceasing  confidence  in  my  work 
and  for  his  final  support. 

To  the  members  of  my  committee,  P. I. P  Boulton,  D.G.  Corneil, 

B.  Perrault,  K.C.  Sevcik,  E..V.  Swenson  and  R.W.  Taylor,  for  their 
careful  reading  of  the  thesis  which  resulted  in  a  much  improved 
product. 

To  my  colleagues  John  Carter,  Fred  Lochovsky  and 
Yannis  Vassil.iou  for  much  time  spent  in  proofreading,  technical 
discussions  and  logistic  help. 

To  my  friend  Patricia  Muldowney  for  invaluable  moral  support. 

Finally,  to  the  National  Research  Council  of  Canada  and  the 
Department  of  Computer  Science  of  the  University  of  Toronto  for 
financial  support. 


Digitized  by  the  Internet  Archive 
in  2018  with  funding  from 
University  of  Toronto 


https://archive.org/details/technicalreportc98univ 


Contents 


Part  I  Introduction 

Chapter  1.  Introduction 

1.1.  Statement  of  Problem 
1. 2-  Motivation 


1.2.1. 

Data  Translation 

1.  2-2. 

Multiple  View  Support 

1.2.3. 

Multiple  Model  Support 

1.2.4. 

AN  SI/SP ARC  Architecture 

1.2.5. 

Database  TerminaJ.s 

1.2.  6. 

Distributed  Databases 

1.2.7. 

Summary 

1.3..  Previous  Work 

1.4.  Contributions  and  Outline  of  Thesis 


Part  II  Mathematical  Aspects  of  Mappings 

Chapter  2.  Abstract  Machines  and  Mappings 
2c 1.  Abstract  Data  Models 

2. 2.  Abstract  Mapping  Models 

2.3.  Adding  the  Notion  of  Constraint 

2.4.,  Nondeter minist ic  Operations 

2.5.,  Other  Formalisms 

2. 6.  Summary  and  Conclusions 


Chapter  3.  Predicate  Calculus  and  Mappings 

3.1.  First  Order  languages 

3.2.  Predicate  Calculus  and  the  Relational  Model 
3„3.  Examples  of  Consistency  Derivations 

3.4.  Extensions  of  First  Order  languages 

3..  5.  Summary  and  Conclusions 


Part  III 


Relational  Structure  Mappings 


Chapter  4.  Formulation  and  Soundness  of 


4..  1  . 

Rules  for  Relational  State  Mappings 

A  Relational  Data  Model  and  Mapping  Model 

4.  2o 

Determining  State  Mappings 

4.3. 

Soundness  of  the  Derivation  Rules 

4..  4. 

Summary  and  Conclusions 

Chapter  5.  Completeness  of  the  Rules 


5.,  1. 

for  State  Mappings 

Incompleteness  of  the  Rules  and  Modifications 

5.  2. 

Completeness  of  the  Rules 

5„  3.. 

Subset  Constraints 

5.4. 

Extensions  to  Many  Levels 

5*j  5. 

Summary  and  Conclusions 

Part  IV  Relational  Operation  Mappings 


Chapter  6. 

Operations  and  Operation  Mappings 

6„  1.. 

Relational  Operations  and  Operation  Mappings 

6.  2. 

Properties  of  Operation  Mappings 

6..  a. 

Formulation  and  Soundness  of  Procedures 

for  Weak  Type  3  Operation  Mappings 

6.  4.. 

Completeness  of  Procedures 

6..  5- 

Summary  and  Conclusions 

Chapter  7. 

.  Further  Properties  of  Operation  Mappings 

7..  1„ 

Medium  and  Strong  Type  3  Mappings 

7..  2 

Completeness  of  Procedures 

7.  3., 

Other  Types  of  Operation  Mappings 

7. 

Summary  and  Conclusions 

Part  V 


Concluding  Remarks 


Chapter  8  Applications 

8..  1.  Hierarchical  Mappings  as  Relational  Mapping 
8o2.  Summary  and  Conclusions 

Chapter  9.  Contributions  and  Future  Work 

Re  ferences 

Appendix 


Gl ossary 


(1,1)  8 


PART  I 


CHAPTEE  _1 

Introduction 


1.1. 


Statement 


of  Problem 


Pow  can  database  management  systems  give  the  most  possible 
end-user  facilities?  This  is  the  overall  problem  dealt  with  in 
this  thesis. 


Many  facets  of  thi 
involving  "mappings", 
have  a  very  specific 
Informally,  however, 
connect  two  different 
instance  of  our  gener 
Language  A,  preferred 
preferred  by  user  2  t 


s  problem  can  be  treated  as  problems 
In  this  thesis  the  term  "mapping*1  will 
meaning  to  be  presented  in  Chapter  2. 
a  mapping  is  a  hind  of  black  box  which  can 
databrse  interfaces.  So,  for  example,  one 
al  problem  is  to  be  able  to  connect  Database 
by  user  1,  and  Database  Language  B, 
o  a  third  interface.  Database  Language  C: 


user  1  user  2 


- 1 — 

1 

— 

— 

mapping! 

mapping 

1 

1  IT  f 

L  - 

r  J 

} 

1 

- 1 — 

_ 

_ 

data 


These  connections  should  be  t 
user  1  should  neither  be  awar 
Similarly,  user  2  should  not 
interface  C  exists.  Updates 


ransparent  to  the  users.  That  is 
e  of  user  2,  nor  of  interface  C. 
have  to  know  that  user  1  or 
by  one  user  should  not  bother  the 


# 


(1,1)  9 


other.  In  the  r^xt  section,  we  will  motivate  the  general  problem 
and  give  more  detailed  examples  of  its  manifestations. 

1.2.  Motivation 

A  database  management  system  must  satisfy  a  variety  of  user 
needs.  Some  users  like  to  see  the  data  in  the  form  of 
hierarchies;  some  users  prefer  to  have  the  data  presented  as 
tables.  Users  with  a  high-level  authorization  may  need  to  see 
much  mere  information  about  certain  entities  than  the  normal 
user.  A  user  may  want  to  see  data  which  is  located  in 
geographically  separated  points  as  if  it  were  all  local  data. 

Cne  way  of  effectively  dealing  with  these  requirements  is  by 
using  the  concept  of  mapping.  To  give  a  better  idea  of  the  way 
in  which  mappings  can  accommodate  the  requirements  of  modern 
database  management  systems,  we  will  give  a  number  of  examples  of 
their  use. 

1.2.  1 .  Data  Translation 

Data  translation  has  been  defined[  SDTG  ]  as  the  process  whereby 
data  stored  m  a  form  that  can  be  processed  on  one  computer,  can 
be  used  by  the  same  or  different  processing  systems  on  a  possibly 
different  compute!-  The  possible  motives  for  carrying  out  a  data 
translation  are  several:  A  new  computer  system  may  have  been 
installed  which  is  incompatible  with  the  old  one;  the  enterprise 
may  be  moving  from  manual  or  computerized  files  to  a  database 
management  system;  the  move  may  be  from  one  database  system  to 
another,  or  there  may  need  to  be  a  reorganization  of  the 
underlying  data  structures  for  efficiency  reasons. 

Data  translation  is  inherently  a  mapping  process.  Although  it 
is  an  off-line  process,  it  is  logically  very  similar  to  state 
mappings  (to  be  discussed). 

Below  we  give  an  example  of  a-  data  translation  using 
XPBS[  SHTG  ]•  XPPS  is  a  language  for  reorganizing  hierarchically 
structured  data..  The  original  database  is  organized  according  to 
two  hierarchical  definition  trees,  one  with  four  nodes  and  the 
other  with  a  single  node: 


(1,1)  10 


Dept[  Dept#,M  gr ,B  udget  ] 


Em p[  Eu. o ,  Jo b  ]  Pro  ir  Pr  cj#,  leader  1 

I 

Eguip[  Itemjt. rDesc  ] 

Pro jects[ Proj# , Tit le. Location] 

Suppose  it  is  necessary  to  create  a  Proj-dept  database  for 
projects  located  in  'San  Jose'.  This  database  should  have  the 
following  structure: 

Pro j-dept[ Proj #, Title ] 

DeptT  Dept # , Mgr , Budget , Lead  er ] 

Eg uip[ Item# , Desc  ] 

The  following  XPRS  operations  will  accomplish  the  restructuring: 

T1  =  slice (Dept#, Mgr, Budget, Proj  from  Dept) 

T2  =  sort ( T 1  b_y  Froj#,Dept#) 

T3  =  graft (T2  onto  Projects  where  T2 . Pro j #=?ro ject s. Pr o j #) 

T 4  =  select (from  T3  where  Location= ' San  Jose') 

Proj-dept  =  consolidate  (T4) 

The  application  programs  which  now  will  be  run  against  the 
Proj-dept  database  will  expect  all  the  usual  properties  of  a 
hierarchy  to  hold.  It  would  be  helpful  in  this  example  and 
essential  in  more  complicated  cases  to  have  formal  tools  and 
algorithms  to  ensure  that  the  translation  does  indeed  result  in 
the  expected  structure. 

1.2.2.  Multiple  View  Support 

Multiple  view  support  is  a  facility  whereby  a  database  can 
appear  in  a  different  form  for  different  users[KlTs].  It  can  be 
thought  of  as  a  protection  facility:  Having  multiple  views  may 
protect  users  from  the  complexities  of  a  large  database.  It  may 
also  provide  protection  for  the  database  from  malicious  or 
ignorant  users.  views  mask  off  various  parts  of  the  database; 


(i,i,  n 


they  also  provide  a  customized  window  for  rhe  user  to  look 
through.. 

As  an  example  of  the  us“  of  views,  consider  the  following  two 
relations: 


Name 

Addr 

Dept 

Smith 

Jones 

Carter 

L 

15  Bloor 

25  King 

171  Yonge 

Paint 
Toy 
Paint 
- - 

Name 

Loc 

Mgr 

Paint 

King 

M  iles 

Toy 

Queen 

Quin 

Pipe 

King 

Miles 

We  could  define  a  view  on  these  relations  by  an  SQL-like[  ABCE ] 
statement,  SQL  is  a  relational  data  language  in  which  tables 
(relations}  as  above  can  be  merged  by  matching  values  in  columns. 
For  example: 


Employee  <—  select  Emp.Name,  Emp.Addr,  Dept. Mgr 
where  Emp..  Dept  --  Dept. Name 
/*  Match  rows  of  Emp  with  rows  of  Dept  having 
the  same  department  name  and  keep  the  Emp 
Name  and  Addr  columns  and  the  Dept  Mgr  column  */ 

Eor  the  particular  states  of  the  Emp  and  Dept  relations  given , 
this  view  would  appear: 


Employee 

Name 

Addr 

Mgr 

' 

Smith 

J  ones 
Carter 

15  Bloor 
25  King 

171  Yonge 

Miles 

Quin 

Miles 

_  0 

suppose 

a  user 

wanted  to 

get  an 

the  manager  of  Smith?"  If  the  user  were  guerying 
relations  Emp  and  Dept,  a  natural  way  to  write  th 
be : 


query  "Who  is 
the  base 
e  query  would 


dept  <-  select  Erap.Dept  where  Emp. Name=' Smith' 
answer  <—  select  Dept. Mgr  where  Dept. Name=dept 


(1.1)  12 

If  the  user  were  Interacting  with  the  Employee  view,  however, 
only  a  single  simple  query  would  be  needed: 

answer  <—  select  Employee. Mgr  where  Employee . N ame= *  Smith  * 

It  is  a  general  property  of  views  that  they  allow  queries  to  be 
formulated  more  simply  than  is  possible  for  queries  on  base 
relations  (or  segments,  or  records,  etc.).  This  is  because  views 
tailor  the  database  to  the  users'  needs. 

The  ability  to  support  multiple  views  is  clearly  a  desirable 
property-  However,  a  number  of  problems  arise  when  users  are 
able  to  interact  with  views  as  if  they  represented  the  "real" 
database. 

If  a  user  of  the  Employee  view  believes  that  each  employee  has 
only  one  manager  (which  is  true  in  the  given  example)  ,  then  in 
order  for  this  view  to  be  "consistent"  with  the  user's 
expectations,  there  must  be  a  certain  amount  of  structure  on  the 
base  relations.  In  this  example,  there  must  be  a  functional 
relationship  (functional  dependency;  see,  e.g.,  [Bern75])  from 
Name  tc  Dept  in  Emp  and  from  Name  to  Mgr  in  Dept.  For  suppose 
the  functional  relation  Name— >Dept  in  Emp  did  not  hold,  say  there 
was  also  a  tuple  <Smith,  15  Bloor,  Toy>  in  Emp.  Then  both 
<Smith,  15  Bloor,  Miles>  and  <Smith,  15  Bloor,  Quin>  would  appear 
in  Employee,  and  this  would  contradict  the  user's  expectations 
since  'Smith'  appears  with  both  'Miles'  and  'Quin'.  A  similar 
problem  would  arise  if  the  Dept  relation  violated  the  Name— >M  gr 
functional  relation-. 

In  general,  we  must  be  able  to  tell  what  structures  the  view 
will  have,  knowing  the  structure  of  the  underlying  database. 

This  is  exactly  the  problem  solved  in  Chapter  4. 

Another  problem  with  views  is  how  to  handle  updates.  In  the 
above  example,  suppose  a  user  of  the  Employee  view  wanted  to 
change  Smith's  manager  from  Miles  to  Quin.  That  is,  the  user 
wants  to  issue  the  command: 

u pdate  Employee. Mgr  :=  'Quin'  where  Employee. Name= ' Smith ' 

Now  the  Employee  relation  does  not  really  "exist"  —  it  is  a 
virtual  object.  Hence  we  must  translate  (with  an  operation 


(1,1)  13 


mapping)  the  above  command  to  commands  on  the  base  relations. 

Eut  does  this  mean  that  the  update  on  the  base  relations  should 
be 

update  Emp.Dept  :=  •Toy'  where  E mp. Name= ' Smith  ' ,  (*) 

or  does  it  mean  that  we  want 

update  Dept. Mgr  :=  'Quin1  where  Dept. Name= ' Paint '  ?  ( **) 

That  is,  do  we  mean  to  change  the  department  Smith  works  in,  or 
do  we  mean  to  change  the  manager  of  Smith's  department?  The 
problem  is  partly  oue  of  "real  world"  semantics  and  partly  a 
mapping  problem.  In  this  example,  there  would  be  general 
agreement  that  the  most  logical  interpretation  of  the  update  on 
the  Employee  view  would  be  the  update  on  the  Dept  attribute  of 
Emp.  This  opinion  is  predicated  on  ail  understanding  of 
"employees",  "departments"  and  "managers".  This  aspec'  of  the 
view  update  problem  will  not  be  dealt  with  in  this  thesis  since 
it  is  more  in  the  roaln  of  Artificial  T ntelligence.  However,  the 
mapping  aspect  of  updates  to  views  is  an  appropriate  problem  to 
study.  We  will  be  investigating  ways  to  assure,  for  example, 
that  the  above  operations  (*)  and  (**}  really  cause  the  Smith 
tuple  in  the  Employee  view  to  have  a  Mgr  value  of  'Quin'. 

Thus  we  see  that  when  there  are  many  users  present,  and  when 
seme  of  these  users  can  update  the  database  through  their  views, 
it  becomes  essential  to  understand  exactly  the  means  (i.e., 
mappings)  by  which  the  views  are  created  and  maintained. 

1.2.3.  Multiple  Model  Support 

Multiple  model  sup port[ Ni j s77 ]  allows  interaction  with  the 
database  through  several  data  models  along  with  their  associated 
data  languages-  The  reasons  for  supporting  multiple  data  models 
are  similar  to  thn,se  for  supporting  uultiple  views:  Different 
users  have  different  preferences;  different  groups  of  users  may 
traditionally  be  oriented  towards  one  or  another  data  language. 

In  addition,  the  nepds  of  different  applications  may  be  met 
better  by  one  data  model  than  another."  Tf  Parts  are  always 
accessed,  say  by  Inventory  Control,  through  their  Suppliers,  the 
most  convenient  data  model  may  be  hierarchical  with  a  Supplier 


(1,1)  1<* 


segment  having  a  Part  descendent.  If  Parts  and  Suppliers  ara 
accessed  at  random  by,  say.  Auditing,  then  a  relational 
representation  with  Part  and  Supplier  relations  may  be  best. 


The  following  example  shows  three  descriptions  in  three 
different  data  models  of  (approximately)  the  same  database: 

(i )  hierarchical 

Pr  o  jec  t[  J#  ,  Name, Loc  ] 

Supplier[ S#, Name, Loc ] 

Part[P#, Name, Wt, Qty] 


(ii)  network 
Pro  jec  t[  J#,  Name  ,  Loc  ] 

!  j-s 

i 


Supplier[  S#  , Name, Loc] 
S-S 


Part[ ?#, Name  ,Wt ] 
jP-S 
j 


m 

;upply[  Qty  1 


(iii)  relational 

Proiectf  J#, Name, Loci 
Supplier[ S  #, Name, Loc  ] 

Part[P#,Name,Wt] 

Supply[  J#,S#,P#,  Qty  ] 

As  with  multiple  view  support,  the  support  of  multiple  data 
models  reguires  that  there  be  a  mapping  facility  in  the  database 
system.  One  natural  way  to  have  access  to  the  database  through 
multiple  data  models  is  to  map  all  user  data  models  to  one 
central  (conceptual)  data  model.  A  "data  model  mapping"  must 
specify  the  correspondence  of  schemas  (descriptions)  and  also  of 
operations.,  for  example,  suppose  a  database  system  supports  a 
relational  user  model  above  a  central  hierarchical  model.  A 
mapping  model  for  this  situation  would  specify  how  to  translate  a 
hierarchical  schema  to  a  relational  one.  This  specification 
would  produce  schema  (iii)  above  from  schema  (i)  by  some  kind  of 
normalization  process[ Codd70 ].  This  mapping  model  would  also 
specify  how  to  translate  the  relational  data  language  to  the 


(1,1)  15 


hierarchical  data  language.  Consider  a  relational  guery  on 
schema  (iii)  such  as  the  following  SQL  guery: 

select  Pro ject. Name ,  Part. Name 

where  Project.  Loc  =  Toronto'  and 
Supply. S#=* S8 *  and 
Supply. J#=Pro ject. J#  and 
Supply. P#=Part. P  # 

/*  Get  the  project  and  part  names  for  projects 
located  in  Toronto  and  where  the  parts 
are  supplied  to  those  projects  by  supplier  S8  */ 

This  guery  might  be  translated  into  a  hierarchical  guery  such  as 
the  following  one  written  in  HQL,  a  casual-user  oriented 
hierarchical  guery  language[ Fehd], 

for  each  Supplier  having  S#-'S8' 

and  in  Project  having  Loc=' Toronto' 
list  (Name  of  Project)  r  (Name  for  each  Part) 

/*  Select  all  the  Supplier  segments  for  supplier  S8 
whose  Project  parent  is  in  Toronto,  and  list  the 
Project's  Name  and  the  Name  of  all  the 
subordinate  Parts.  */ 

One  class  of  mapping  problems,  then,  is  the  determination  of 
whether  such  a  guery  translation  will  give  the  desired  result. 

We  may  distinguish  two  other  difficult  aspects  of  operation 
mappings  between  data  models. 

We  may  classify  data  model  operations  as  either  set-oriented 
or  record-oriented.  Set-oriented  operations  operate  on  sets  of 
data  objects  and  may  produce  results  which  are  also  sets.  All 
the  examples  of  operations  given  sc  far  are  set-oriented. 

Eecor d-or ient ed  operations  take  single  objects  as  arguments  and 
produce  results  which  are  also  single  objects. 

When  the  central  or  lower  data  model  has  se t-^orien ted 
operations,  it  is  hard  to  translate  record-oriented  operations  at 
the  user  level[KlTs]»  The  following  Codasyl-like  guery  uses 
record-oriented  operations: 


(1,1) 


16 


find  first  Project  where  Loc= 'T oron to  * 
goto  1 

3:  find  next  Project  w  here  Loc= *  Toronto  * 

1:  if  no  more  then  stop 

“*  »  1 

find  first  Supply  via  J-S 
goto  2 

4:  find  next  Supply  via  J-S 
2:  if  no  more  then  goto  3 
find  S- S  owner 
add  Supplier. Loc  to  list 
goto  4 

i 

/*  Get  the  locations  of  all  suppliers 
supplying  projects  in  Toronto  */ 

This  guery  could  be  expressed  in  a  set- oriented  language 
such  as  LSL[Tsic]: 

select  Project  where  Loc= ' Toronto' 

link  via  J-S 

link  via  S-S  keep  Loc 

On  the  other  hand  the  following  guery  (expressed  in  English)  is 
not  so  set-oriented: 

Find  a  sequence  SO, S 1 , .... . , Sn  of  Suppliers  such  that 

(i)  SO  Supplies  a  Project  located  in  New  York, 

(ii)  Si  Supplies  a  Project  in  a  location  in  which  is 
Located  a  Project  Supplied  by  S(i*1), 

for  1=0, . .  „  ,  n- 1 ,  and 

(iii)  Sn  Supplies  a  Project  Located  in  Toronto. 

It  would  be  very  difficult  to  express  this  query  in  a  set- 
oriented  data  language.  Yet  if  the  central  data  model  is  set- 
criented,  and  a  user  data  model  is  record-oriented,  these  kinds 
of  translations  must  be  made. 

The  other  difficult  problem  involves  updates.  Different  data 
models  have  different  conventions.  For  example,  suppose  a  Part 
segment  is  deleted  from  the  hierarchical  schema  (i)  above. 

Should  the  users  of  schemas  (ii)  and  (iii)  see  this  deletion,  and 
if  so,  hor  should  they  see  it?  Whether  they  see  it  or  not 
depends  on  semantics  of  the  applications  not  apparent  in  the 


(1,1)  17 


schemas.  Whatever  the  case,  there  must  still  be  a  mapping 
facility  to  express  the  effect  of  the  deletion  in  the  other  data 
models. 

If  it  is  decided  that  the  deletion  should  also  be  visible  in 
the  other  models,  there  is  still  the  question  of  the  form  the 
deletion  should  take.  For  example,  in  the  network  schema,  the 
hierarchical  deletion  could  be  reflected  as  a  deletion  of  a  Part 
record,  a  deletion  of  a  Supply  record,  the  deletion  of  both  a 
Part  record  and  a  Supply  record,  or  a  disconnection  of  a  Supply 
record  from  a  P-3  set.  Again,  mapping  machinery  is  reeded  to 
express  precisely  these  questions  and  to  provide  answers. 

1.2.4.  A  NS  I/S  PARC  Architecture 

The  increasing  importance  of  (computerized)  database 
management  systems  in  our  technological  society  was  recognized 
some  time  ago  by  the  American  National  Standards  Institute 
(ANSI).  In  1972  the  Standards  Planning  and  Requirements 
Committee  (SPARC)  of  ANSI/X3  (Computers  and  Information 
Processing)  established  a  study  group  to  investigate  the 
potential  for  standardization  in  the  area  of  database 
managemen t[  ANSI75, 77 ].  In  the  course  of  its  deliberations,  the 
ANSI/X3/S PARC  Study  Group  on  Database  Management  Systems  Jabbr. 
ANSI/SPARC)  proposed  a  framework  for  database  systems.  A  portion 
of  the  framework  follows: 


(I.D  18 


]  enter: 
f  a  dm  in  is 

1 _ 

Drise  ] 
trator  j 

_ i 

-1 

3 

- 1 

f  da 
admi 


it 

ii 


d  at  abase  — 

ministrator  I 


internal 

-schema 

processor 


ii 

ii 


14 

ii 

ii 

-fi¬ 

ll 

ii 


schema 

processor 

==--==2 


D  D/D 


3 

ii 


;; 

"  y  ‘  i 

a  pplication 
administrator 

" 

j  i» 

: - 4 

5 

ii 

n 

jl- 

ii 

it 


externa  1 
schema 
processor 


T 


==*==34 


==*==36 


=  *==38 


— L 

it 

- L  — | 

it 

1 

- 1 - - — 

internal 

storaqe 

it 

_ii — 

conceptual i 
internal  (- 

ii 

u j 

externa  1 
conceptual 

:: ! 

transform 

it 

ii 

transform  j 

ii  i 
ii  I 

transform 

_ _  _  . . .j 

21 


30 


31 


==*==12 


e  xterna 1 
database 
a  pplica- 
ticn 
program 


[ 


application] 

programmer  J 


The  main  components  of  the  framework  are  a  set  of  persons  in 
roles,  a  set  of  processing  functions  and  a  set  of  interfaces 
(numbered  in  the  diagram)  among  elements  of  these  sets.  The 
framework  is  partitioned  into  three  realms  or  levels:  the 
internal,  the  conceptual  and  the  external.  At  the  internal  level 
there  is  one  database  described  by  one  schema.  At  the  conceptual 
level  there  is  also  one  database  described  by  one  schema.  At  the 
external  level  there  are  any  number  of  external  databases,  each 
described  by  a  schema. 

The  internal  level  contains  performance  and  other  computer- 
oriented  objects.  For  example,  an  internal  data  model  may  have 
as  objects  direct  files,  indices  and  pointer  arrays.  It  is 
sometimes  said  that  the  internal  database  is  what  is  stored  on 
the  computer.  This  may  or  may  not  be  true,  depending  on  one's 


(1,1)  15 


point  of  view  and  also  on  the  system  architecture.  For  example 
direct  access  files  may  be  implemented  as  E-trees  which  in  turn 
reside  on  a  paged  data  set  which,  on  the  disc,  contains  error 
checking  bits.  The  framework  draws  the  line  (which  is  expected 
to  move  with  time)  between  database  system  concerns  and  computer 
concerns.  In  any  case,  this  is  the  lowest  "abstract  machine" 
level  directly  involved  in  database  management. 

If  one  considers  that  the  framework  describes  an  "onion"  of 
nested  machines,  then  the  conceptual  level  will  be  the  next  layer 
out  from  the  internal  level.  The  conceptual  database  contains 
the  same  information  as  the  internal  database,  but  its  motivation 
is  different.  The  objects  of  the  conceptual  database  model  the 
entities  of  interest  to  the  enterprise  (the  defined  environment 
in  which  the  database  system  will  operate).  The  imposition  of 
the  conceptual  level  between  the  external  and  internal  levels 
allows  both  of  these  latter  levels  to  evolve,  for  their  own 
reasons,  without  one  unnecessarily  affecting  the  other.  The 
conceptual  level  provides  a  mechanism  for  the  centralized  control 
of  the  use  and  content  of  the  database.  Questions  concerning  the 
necessary  and  desirable  features  of  the  conceptual  level  are 
being  actively  researched  at  this  time  (e.g.,  [ Kent ],[ Schd ]) . 

The  outermost,  external  level  provides  application-oriented 
views  of  the  database.  The  role  of  this  level  is  to  provide  each 
application  (e.g.,  payroll,  marketing,  B&D)  or  application  family 
with  the  portion  of  the  database  it  needs  in  a  form  most  suitable 
to  it.  The  most  suitable  form,  that  is,  data  model,  may  in 
principle  consist  of  Fortran  arrays  or  complicated  semantic 
networks  cr  any  other  structure  class. 

The  framework  proposed  by  ANSI/S  PARC  provides  a  structure 
within  which  multiple  views  and  multiple  data  models  can  be 
supported.  It  allows  a  high  degree  of  data  independence  and  of 
control  over  the  database  and  facilitates  dynamic  reorganization 
(data  translation).  It  also  can  form  a  basis  for  well-structured 
distributed  databases. 

Mappings  are  essential  to  this  framework.  The  linking  of  the 
three  levels  —  the  internal,  the  conceptual  and  the  external  — 
is  accomplished  via  mappings. 


(1/1)  20 


We  have  seen  some  of  the  problems  arising  frcm  supporting 
multiple  views  and  multiple  data  models.  Another  class  of 
problems  within  this  framework  relates  to  the  use  of  mappings  for 
database  control. 

Consider  the  following  conceptual  schema  in  which  employees 
are  assigned  to  projects  which  are  associated  with  departments: 

Emp[  E#,  Name,  J#  ] 

Pro j[ J#r  Name, Security# ,  D# ] 

De  pt[D#, Name, Mgr  ] 

An  external  schema  might  contain  the  relation: 

Emp loyee[E#, Name, Mgr], 

and  users  of  this  schema  might  be  given  read  access  to  the  E#  and 
Name  fields  of  the  Emp  conceptual  relation  and  to  the  Mgr  field 
cf  the  Dept  conceptual  relation.  Now  consider  the  following 

mapping : 

Employee  <—  select  Emp.,  E#,  Emp,.  Fame,  Dept.  Mgr 

where  Emp.J#  Proj.J#  and 
Froj.Dt  -  Pcpt.Di 

According  to  thi^  r.a^ping ,  the  guei  y 

select  Employee.Mgr  where  Employee. Nnme= ® W ong * 

would  be  translated  as: 

select  Dept. Mgr 

where  Emp. Name  =  'Kong®  and 
Emp.J#  =  Proj.J#  and 
Proj.  D#  =  Dept.  D# 

This  query  requires  access  to  the  Proj  relation.  Yet  as  long  as 
the  external  queries  are  translated  by  the  above  mapping,  the 
user  will  never  see  any  part  of  the  Proj  relation.  There  is  a 
problem,  then,  to  devise  the  appropriate  access  formalism  which 
will  allow  properly  translated  queries  and  updates  access  to 
conceptual  objects  to  which  the  user  nominally  has  no  access. 


(1,1)  21 


The  ANSI/SPAP.C  framework  provides  a  firm  foundation  on  which 
tc  base  well-planned  standardization  programs.  Mappings  play  an 
essential  role  in  this  framework. 


1.  2.  5.  Database  Terminals 

Intelligent  terminals  are  steadily  becoming  more  powerful  and 
less  expensive.  It  has  been  suggested[ Vass  ]  that  these  terminals 

could  contain  their  own  mapping  processors  —  hence  the  name 

"database  terminal".  This  concept  is  in  a  sense  the  dual  of  the 
concept  of  multiple  model,  multiple  view  support.  In  the  latter 
cases,  one  database  provides  many  different  interfaces  to  many 
different  users;  while  in  the  former  case,  one  user  is  provided 
with  one  icterface  to  many  different  databases. 

Suppose,  for  example,  that  user  U  prefers  D5L-Alpha[ Codd72 a  ], 
a  predicate  calculus  oriented  language,  and  that  the  database 
systems  that  U  needs  to  use  are  Codasyl-  and  IMS-based.  First  of 

all,  U  needs  to  see  the  network  and  hierarchical  schemas  in  a 

relational  format.  Consider  the  following  example  hierarchical 
schema : 


Pro  ject[  Name,Loc  ] 


S  upplier[ Name, Loc  ] 


Part[  Name ,  Wt  ] 


One  of  the  mapping  functions  of  the  dataLase  terminal  would  be  to 
transform  this  schema  to  a  relational  one  (by 
norma liza tion[Codd72b ])  : 


Projectf  Name, Loc ] 
Supplier[ Pname , Name . Loc ] 
Part[ Pname, S name, Name,  Wt ] 


As  we  can  see,  such  a  processor  would  need  a  knowledge,  not  only 
of  normalization,  but  also  of  how  to  rename  attributes. 


Consider  now  the  following  network  schema: 


(1,1)  22 


P-D 

Pe  rson[ Name, Age  ]< - Dept[  Name, Budget ] 

P-0 
V 

Ownership[  ] 

A 

j  H-0 

House[  Ad  dr  ,  Price  ] 

The  database  terminal  must  be  able  to  display  this  network  schema 
as  the  following  relational  schema: 

Person[Name,Age,Dname ] 

Dept[ Name , Budget  ] 

Ow nership[ Name , Addr  ] 

House[ Addr , Price] 

We  see  that  this  mapping  module  must  not  only  know  about  renaming 
attributes,  it  must  also  be  able  to  tell  when  the  key  brought 
dcwn  from  a  set  owner  to  a  s«t  member  should  become  part  of  the 
member's  key  (as  in  Ownership)  and  when  it  should  just  become  a 
ncn-key  domain  (as  with  Dept. Name), 

These  schema  translations  are  functional,  as  ooposed  to  the 
conceptual-external  mappings  in  a  general  ANSI/5PAHC  architecture 
which  are  man y- to-man y.  In  other  words,  for  every  schema  in  the 
main  database,  there  is  only  one  possible  mapping  to  one  possible 
schema  in  the  database  terminal;  while  in  an  ANSI/SPABC  database, 
one  conceptual  schema  may  be  mapped  to  many  external  schemas,  and 
cne  external  schema  may  be  bound  to  many  conceptual  schemas. 

This  means  that  when  the  schema  in  the  main  database  changes,  the 
database  terminal  user  automatically  sees  a  corresponding  new 
schema  in  the  terminal's  model. 

Queries  in  the  terminal's  data  model  are  translated  into 
gueries  of  whichever  database  system  is  currently  being  accessed. 
The  goals  are  to  minimize  the  communication  traffic  while  having 
the  new  data  model  interface  cause  no  effect  on  the  main  database 
systems. 


(1,1)  23 


1.2.6.  Distributed  Databases 

Distributed  database  systems  combine  the  features  cf  computer 
networks  and  of  database  management  systems.  A  distributed 
database  on  the  one  hand  is  an  integrated  collection  of  data,  and 
on  the  other  hand,  is  composed  of  distinct,  semi-autonomo us  parts 
cr  nodes.  Nodes  of  the  computer  network  contain  computer 
systems,  and  these  may  be  hosts  for  some  or  all  of  the  components 
of  a  database  system.  In  order  to  integrate  the  description  and 
manipulation  of  the  distributed  data,  mappings  are  needed. 

Distributed  database  systems  are  subject  to  the  same  mapping 
problems  already  mentioned,  and  also  to  additional  problems 
arising  from  the  distribution  of  the  data.  Consider  for  example 
the  following  computer  network: 


Suppose  each  n-^de  has  a  relational  database  system[  EpSV.  ],  and 
suppose  the  schemas  at  each  node  are: 

at  A:  Part-Inv[P#, Qty, Loc, Owner ]  where  Loc*' Boston* 

S  upplier[ S#, Na me, Addr ] 
at  B:  NeeaTP#, J# .Qty 1 

Part-Inv[P£, Qty, Loc, Owner]  w  he  re  Loc='Bostonl 
at  C:  Pro ject[ J#, Loc, Budget]  where  Loc=' Chicago' 
at  D:  Project[ J#,Loc,Pudget]  where  Loc* ' Chicago* 

Suppose  we  are  at  rode  A  and  we  issue  the  guery: 

select  Supplier. Name ,  Supplier .Addr 

where  Supplier.  S#  =  Part-Owner  and 
Part. Qty  >  Need. Qty  and 
Need. P#  =  Part.P#  and 
Need.J#  =  Project. J#  and 
Project. Loc  =  Part. Loc 


(1,1)  24 


This  guery  gets  information  on  suppliers  who  can  supply  parts  to 
projects  from  an  inventory  in  the  same  location  as  the  project. 
Since  all  needed  relations  are  not  at  site  A,  the  guery  must  be 
mapped  to  a  set  of  subgueries,  each  of  which  can  be  processed  at 
a  single  node,  possibly  receiving  intermediate  results  of  other 
subgueries  from  other  nodes.  One  possible  translation  is: 

send  Project  at  C  to  J  at  B 

add  Project  at  D  to  J  at  B 

NJ  (at  B)  <—  select  J, loc,  Need, P# 

where  J.  J#  =  Need.  J# 
send  NJ  at  B  to  NJ  at  A 
P  (at  A)  <-  Part 
add  Part  at  B  to  P  at  A 

answer  <-  select  Supplier . Name ,  Supplier .Addr 

where  Supplier. S#  =  ?. Owner  and 
P.Qty  >  NJ.Qty  and 
NJ. loc  =  P.Ioc 

There  are  many  other  possibilities-  Besides  communication  costs, 
these  mappings  need  to  take  into  account  the  logical  properties 
of  the  distributed  relations.  For  example,  the  mapping  should 
not  waste  effort  on  comparing  Farts  at  B  with  Projects  at  C  since 
it  is  known  from  the  schema  that  the  comparision 
Project. Loc  =  Part. Loc  will  fail. 

There  are  other,  more  architectural  problems  relating  to 
mappings  and  distributed  databases..  These  are  guestions 
concerning  the  way  in  which  distributed  databases  fit  into  an 
ANS  I/S  PAP.  C  framework.  Should  the  distribution  of  data  be  visible 
tc  an  external  user?  Is  it  a  conceptual  level  concept,  or  should 
the  distribution  information  be  placed  in  the  internal  schema  as 
an  efficiency  related  concept? 

1.2.7.  Summary 

The  facilities  demanded  from  advanced  database  management 
systems  include  such  things  as  multiple  view  support,  multiple 
model  support,  data  independence  and  a  framework  for  facilitating 
standardization.  These  needs  can  be  met  using  mappings.  Hence 
it  is  important  to  understand  what  exactly  mappings  are,  what 


11,1)  25 


properties  they  should  have,  and  how  these  properties  can  be 
determined.  Having  thus  motivated  the  study  of  database 
mappings,  we  will  next  discuss  what  work  has  already  been  done  in 
this  area. 

1.3.  Previous  Work 

There  has  been  little  previous  work  specifically  dealing  with 
mappings  as  we  understand  the  term.  However,  there  is  a  body  of 
literature  which  concerns  itself  with  various  isolated  aspects  of 
mapping  problems  or  with  slightly  different  problems.  We  will 
describe  these  works  in  this  section  and  relate  them  to  the 
contributions  of  this  thesis. 

We  have  classified  the  citations  into  five  groups.  The  first 
group  discusses  general  architectures  for  database  management 
systems  which  include  the  notion  of  mapping.  The  second  group 
describes  some  existing  systems,  both  commercial  and 
experimental,  which  at  least  include  the  concept  of  a  "view". 

The  third  set  of  papers  relates  to  mathematical  aspects  of 
mappings.  The  fourth  is  a  set  of  papers  which  present  algorithms 
relating  to  the  ones  appearing  in  this  thesis.  The  last  group 
describes  work  which  debates  the  merits  of  various  conceptual 
models,  that  is,  models  which  are  well-suited  for  representing 
the  overall  information  structure  and  for  defining  views. 

(i)  First  we  describe  reports  and  papers  on  database 
archi  tecture.. 

In  1970  a  joint  working  group  of  Guide  and  Share  (the 
commercial  and  scientific  user  groups  of  IBM  products;  published 
a  report[GuSh]  outlining  database  management  system  requirements. 
Among  the  facilities  necessary  to  meet  the  objectives  of  a 
database  management  system,  the  report  included  the  resolution  of 
data  into  three  levels:  logical  data  used  by  application 
programs,  stored  data  which  is  the  physical  form  of  the  database, 
and  entity  data  which  is  situated  between  the  logical  and  stored 
data  and  which  provides  data  independence..  (Note  the  similarity 
with  the  external,  internal  and  conceptual  levels  of  ANSI/SPAEC.) 
The  need  for  mappings  to  connect  these  three  levels  was 
recognized  in  the  report,  but  only  a  restricted  form  of  structure 


(1,1)  26 


mappings  was  discussed  in  an  informal  manner,  and  operation 
mappings  were  not  discussed  at  all. 

Bracchi,  Fedeli  and  Paolini[  Er  FP  ]  have  discussed  a  multilevel 
relational  database  system.  Their  paper  used  the  term  "mapping 
model",  but  they  did  not  actually  consider  any  global  gualities 
cf  mappings  at  the  same  level  of  abstraction  as  "data  model". 

They  considered  some  details  of  structure  and  operation  mappings, 
but  they  used  terminology  suited  only  to  binary  relations.. 

The  AN SI/SP AFC  Database  Study  Group  published  two 
repor ts[ ANSI75, 77  ]  which  detailed  a  general  three  level 
architecture  for  database  management  systems.  We  discussed  this 
important  work  in  the  previous  section  of  this  chapter. 

G. M.  Ni jssen[ Ni js76, 77  ]  has  proposed  an  overall  architecture 
for  database  management  systems  which  emphasizes  the  need  to  let 
each  user  interact  with  the  database  through  his/her  preferred 
data  model  and  view.  There  were  no  formal  definitions  or 
algorithms,  but  this  paper  argued  very  well  for  the  need  of 
database  management  systems  with  sophisticated  mapping  processors 
—  not  merely  subsetting  processors. 

(ii)  Now  we  discuss  several  existing  database  management 
systems  which  have  some  kind  of  mapping  facility. 

System  R[ ABCE  ]  is  an  experimental  relational  database 
management  system.  It  has  two  facilities  of  a  mapping  nature: 
views  and  triggers.  Views  define  new  relations  from  existing 
ones  by  associating  an  SCI  query  (SQL  is  System  B's  data 
language)  with  a  relation  name.  This  is  actually  a  combination 
cf  schema  definition  and  mapping  definition  as  we  shall  see  in 
this  thesis.  Triggers  cause  certain  SQL  operations  to  be 
executed  at  the  occurrence  of  some  specified  event.  They  can  be 
thought  of  as  operation  mappings.. 

IN GR£S[ S WKH  ]  is  another  experimental  relational  database 
management  system..  In  this  system,  views  are  also  defined  by 
queries.  Queries  and  updates  to  views  are  handled  by  query 
modification  according  to  the  view  definition.  Thus  there  can  be 
at  most  one  operation  mapping  for  every  view.. 


(1,1)  27 


Information  Management  System  (IMS)  is  a  commercial  database 
management  system  marketed  by  IBM.  Data  in  this  system  must 
always  be  structured  hierarchically  (as  trees).  IMS  has  a 
limited  facility  for  defining  views  of  the  database.  Views  may 
emit  subtrees  of  the  original  hierarchy,  or  some  portions  of  the 
tree  may  be  inverted.  There  are  complicated  built-in  rules  for 
updates  to  view  hierarchies. 

A  CODASYL  task  group  published  a  report  in  1971[Coda]  which 
made  specific  proposals  for  database  management  systems.  A 
number  of  current  commercial  systems  follow  most  or  all  of  the 
CODASYL  model.  There  can  be  views  in  CODASYL  systems ,  tut  the 
view  schemas  can  only  differ  from  the  underlying  schema  by  the 
omission  of  items;  that  is,  view  schemas  are  always  subschemas. 
Naturally,  in  this  situation  operation  mappings  are  almost 
trivial. 

(iii)  The  next  group  of  papers  we  discuss  are  in  the  area  of 
mathematical  formalisms. 

Ihe  formalism  for  mappings  most  closely  resembling  ours  is  by 
Paolini,  Pelagatti  and  Bracchi  ([PaPe]  and  [PePB]).  The 
formalism  is  based  on  abstract  mathematical  concepts:  Databases 
are  represented  by  abstract  algebras,  and  mappings  are 
represented  by  homomorphisms.  There  are  a  number  of  flaws  in 
their  work  which  we  discuss  in  Chapter  2. 

Robinson  and  Levitt  define  an  abstract  machine  and  mapping 
formalism  for  writing  structured  pr ograms[ RoLe  ].  Their  formalism 
is  not  intended  to  describe  database  mappings,  but  with  some 
modifications  it  agrees  well  with  the  formalism  described  in  this 
thesis.  (Again,  see  Chapter  2.) 

There  is  a  fruitful  relationship  between  relational  database 
concepts  and  Predicate  Logic.  Fagin[FginJ  uses  a  correspondence 
of  functional  dependencies  with  i mplica tional  statements  of 
Propositional  Logic  to  show  the  completeness  of  functional 
dependency  derivation  rules.  Wong  and  Mylopoulos[ WoM y ]  have 
noted  how  database  constraints  and  database  states  are  analogous 
to  Predicate  Calculus  axioms  and  models,  respectively.  In 
Chapter  3  of  this  thesis  we  extend  this  correspondence  to  include 
database  mappings.. 


(1,1)  28 


S.  Todd[Todd]  developed  a  formalism  for  giving  a  precise 
meaning  to  the  concept  of  automatic  maintenance  of  constraints 
under  updates  to  view  relations.  He  defines  an  ordering  on  the 
set  of  possible  database  changes,  and  declares  the  effect  of  an 
update  to  be  the  smallest  possible  database  change.  This  is  an 
interesting  approach,  but  it  lacks  the  flexibility  necessary  in 
many  database  contexts  (See  Chapter  6.),  and  it  may  seem 
counterintuitive.  On  the  other  hand,  the  formalism  is  defined 
well  and  can  lead  to  useful  theorems. 

Biller,  Glatthaar  and  Neuhold  ([BiGl]  and  [BiGN])  have  used 
concepts  from  denotational  semantics  (e.g.  [deBa],  [Dona75], 
[Eona76],  [lenn  ])  to  define  the  semantics  of  a  CODASYL  data 
model.  In  the  concluding  part  of  this  thesis  we  suggest  a  more 
structured  approach  to  defining  mappings  and  data  models  with  the 
tools  of  denotational  semantics. 

(iv)  Next  we  consider  previous  work  contributing  specific 
algorithms  for  database  problems  directly  or  indirectly  related 
to  mappings. 

Armstrong[  Ar ms  ]  was  one  of  the  first  people  to  contribute 
substantive  mathematical  results  for  database  problems.  He 
presented  rules  for  calculating  functional  dependencies  on  a 
relation  and  showed  that  the  rules  were  both  sound  and  complete. 
In  this  thesis  we  start  with  Armstrong’s  rules  and  formulate  some 
rules  for  calculating  constraints  on  arbitrarily  complicated 
relational  expressions. 

Aho,  Beeri  and  011man[ AhBU  ]  studied  the  problem  of  recognizing 
ncn-loss  joins  of  relations  and  gave  necessary  and  sufficient 
conditions  for  this  property  to  hold.  This  theory  can  be  applied 
to  mapping  problems;  in  Chapter  5  we  indicate  how  the  algorithm 
would  need  to  be  used  in  the  presence  of  many  levels  of  mappings. 

Aho,  Sagiv  and  Ullma n[ AhSU  ]  studied  the  complexity  of  the 
eguivalence  problem  for  relational  expressions.  Some  procedures 
we  give  in  Part  III  involve  the  equivalence  of  relational 
expressions.  Complexity  problems  are  not  dealt  with  in  this 
thesis,  but  this  work  can  be  applied  to  our  mapping  algorithms. 


(1,1)  29 


Dayal  and  Bernstein[ DaBe  ]  have  studied  the  updatability  of 
relational  views-  In  their  approach,  a  view  relation  can  be 
altered  by  tuple  insertions,  deletions  or  modifications  only  if 
four  conditions  hold:  There  must  be  a  unique  translation  of  the 
view  update  to  updates  on  base  relations;  there  can  be  no 
extraneous  updates  on  the  base  relations;  there  can  be  no  side 
effects  on  the  view  relations,  and  the  resulting  state  must 
violate  no  constraints.  They  also  assume  that  insertions 
(deletions,  modifications)  to  the  view  are  mapped  to  insertions 
(deletions,  modifications)  on  the  base  relations.  For  certain 
cases,  algorithms  are  given  which  test  for  the  existence  cf 
translations  satisfying  the  above  condidtions. 

Our  approach,  which  seeks  to  provide  answers  to  the  problems 
of  database  systems  conforming  to  the  ANSI/SPARC  framework,  does 
net  place  such  strict  conditions  on  view  updates.  This  is 
explained  fully  in  Part  IV. 

Furtado  and  Sevcik[FuSe]  also  discuss  the  allowing  of  updates 
on  relational  views.  They  give  a  table  of  transformations  of 
view  updates  to  updates  on  base  relations  where  the  view  is 
derived  using  relational  algebra  operators  (actually,  quotient 
algebra  operators).  Then  a  number  of  special  cases  of  views 
derived  from  specific  operator  sequences  are  considered. 

In  this  paper,  as  in  the  previous  one,  the  concept  of 
"allowable  update"  is  more  restrictive  than  ours.  The  table  of 
transformations  is  similar  to  the  "guidelines"  we  give  in  Chapter 
6,  but  we  have  recognized  that  in  the  context  of  the  ANSI/SPARC 
framework,  there  may  be  many  choices  in  translating  one  update, 
and  these  may  depend  on  the  semantics  of  the  view. 

This  paper  noted  a  number  of  conditions  such  as  the  "cross¬ 
term  condition"  which  must  hold  for  updates  to  be  allowable  (in 
the  restrictive  sense).  These  are  similar  to  some  of  the  terms 
in  our  definitions  concerning  strong  operation  mappings  given 
Chapter  7. 

(v)  Lastly,  we  mention  work  that  has  been  done  on  conceptual 
data  models. 

Conceptual  data  models  are  intended  to  serve  two  functions: 
the  modelling  of  some  portion  of  the  real  world  and  the 
interfacing  of  the  internal  and  external  levels  of  a  database. 


(1,1)  30 


There  has  beer  much  work  done  cn  which  features  of  a  conceptual 
data  model  are  good  or  are  not  good  for  modelling  the  real  world. 
{See ,  for  example,  [Falk],  [Kent],  [Schd]  and  other  papers  in 
[Nssn76]  and  [Nssn77].)  However,  there  seems  to  have  teen  only 
one  paper  by  Bracchi,  Paclini  and  Pelagatti[ Br PP  ]  which  discussed 
the  merits  of  various  conceptual  data  models  from  a  mapping  point 
cf  view.  We  do  not  make  any  direct  contributions  to  the 
conceptual  model  debate,  but  this  thesis  will  provide  additional 
objective  criteria  for  conceptual  model  evaluation  with  respect 
to  mappings  to  other  levels. 

1.4.  Contrib utions  and  0 ut line  of  Thesis 

This  thesis  is  organized  into  five  parts  labelled  by  Roman 
numerals;  each  part  contains  one  or  more  arabic  numbered 
chapters..  The  thesis  is  to  a  large  part  mathematical,  and  unless 
otherwise  stated,  all  theorems  are  those  of  the  author. 

Part  I  consists  of  the  initial  introductory  chapter. 

This  present  chapter  has  provided  an  introduction  to  the  needs 
fer  mappings  in  database  management  systems.  We  described  a 
number  of  important  database  facilities  in  which  mappings  play  an 
essential  role.  The  previous  research  in  the  area  of  database 
mappings  and  in  closely  related  areas  was  discussed,  and  we  noted 
its  shortcomings.  The  overall  goal  of  this  thesis  is  to  give 
answers  to  the  questions  of  what  exactly  mappings  are,  what  kind 
of  properties  they  ought  to  have  ana  under  what  conditions  they 
will  have  these  properties. 

Part  II  consists  of  Chapters  2  and  3  and  describes 
mathematical  aspects  of  mappings-  In  Chapter  2  we  present  a 
formal  framework  for  mappings-  This  framework  captures  the 
notions  of  data  model,  schema,  state,  operation,  mapping  model, 
structure  mapping,  and  operation  mapping.  This  formalism  is 
better  suited  as  a  language  for  database  mapping  problems  than 
the  only  other  one  to  have  appeared  in  the  literature:  It  is 
simple;  it  is  mathematically  precise,  and  it  can  model  the 
important  mapping  concepts.  With  it  we  give  precise  definitions 
of  the  properties  of  mappings  which  we  show  to  be  desirable  from 
an  intuitive  argument.  For  structure  mappings,  the  main 


(1,1)  31 


desirable  property  is  that  the  mapping,  together  with  the 
constraints  in  the  underlying  schema,  will  always  cause  the 
constraints  in  the  view  schema  to  be  satisfied-  For  operation 
mappings,  the  main  desirable  property  is  that  the  observed  effect 
on  the  view  of  performing  the  operation  translation  on  the 
underlying  database  will  always  agree  with  the  expected  effect  of 
performing  the  original  operation  directly  on  the  view  state. 

Chapter  3  discusses  the  relationship  of  certain  database 
concepts  with  Predicate  Calculus.  We  show  how  constraints 
correspond  to  Predicate  Calculus  axioms,  how  schemas  correspond 
to  theories,  how  structure  mappings  correspond  to  defined 
theories.  Necessary  and  sufficient  conditions  for  the  desirable 
structure  mappings  are  taken  from  a  theorem  of  Predicate 
Calculus,  and  from  the  ursolvability  of  the  n theoremhood " 
property  we  get  our  first  unsolvafcility  result  involving  database 
mappings.  However,  a  positive  result  is  that  we  have  an 

equivalent  formulation,  aibeit  an  undecidable  one  in  its  full 

generality,  for  the  structure  mapping  property  of  never  violating 
view  constrants.  This  formulation  can  be  used  with  virtually  any 
data  model  and  mapping  model  whatever. 

Part  III  of  the  thesis,  consisting  of  Chapters  4  and  5, 
studies  the  problems  of  structure  mappings  in  a  relational 
context.  Although  this  data  model  is  specialized,  it  represents 
a  core  of  concepts  to  be  found  in  almost  all  data  models.  (We 

illustrate  this  in  Chapter  8. )  We  study  the  problem  of 

determining  the  basic  property  that  a  structure  mapping  satisfies 
the  view  constraints  given  that  the  underlying  constraints  are 
satisfied.  When  unrestricted  relational  algebra  is  used  as  the 
mapping  language,  this  problem  is  unsolvahle.  This 
un decidability  is  removed  by  omitting  the  set  difference  operator 
from  the  mapping  language.  We  formulate  a  set  of  rules  for  the 
problem  in  Chapter  4,  and  we  show  that  they  are  sound,  i.e.,  that 
the  rules  will  not  sa y  a  mapping  has  the  property  when  it  really 
does  not. 

Chapter  5  studies  the  guestion  of  the  completeness  of  the 
rules  for  determining  the  desirable  structure  mappings.  The 
rules  are  complete  if,  when  they  say  that  a  mapping  does  not  have 


(1,1)  32 


the  property,  it  really  does  not.  After  making  a  second 
adjustment  to  the  mapping  language,  we  can  show  that  the  rules 
are  complete.  In  one  of  the  later  sections  of  Chapter  5,  we 
extend  the  class  of  constraints  allowed  in  the  data  model  to 
include  so-called  subset  constraints  or  "foreign  key" 
constraints.  This  allows  the  data  model  to  represent  the  other 
two  important  classes  of  data  models,  namely,  hierarchical  and 
network  data  models.  We  indicate  how  the  rules  can  be  extended 
to  cover  subset  constraints.  The  last  section  of  this  chapter 
also  considers  another  possible  extension  to  the  rules.  This 
would  allow  the  rules  to  apply  to  more  than  one  level  of  mapping. 

The  contributions  of  Part  III  to  the  understanding  of  database 
management  systems  are  several.  At  a  general  level,  we  show  that 
it  is  in  principle  possible  to  build  some  of  the  processors 
envisioned  in  the  ANSI/SPAEC  framework.  This  is  a  positive 
result-  On  the  other  hand,  we  show  by  the  undecidability  results 
that  for  certain  cases,  these  processors  cannot  be  built. 

At  a  more  specific  level,  we  give  a  sound  and  complete 
algorithm  for  deciding  when  a  relational  algebra  structure 
mapping  has  the  property  cf  mapping  states  to  states.  Host  other 
data  models  will  contain  the  features  of  our  relational  model, 
perhaps  in  a  different  form.  Hence,  our  algorithms  can  be 
translated  to  apply  to  many  other  situations  than  the  actual  one 
studied.  This  is  especially  true  when  the  rules  are  extended  to 
include  subset  constraints. 

Part  IV  of  the  thesis,  consisting  of  Chapters  6  and  7,  studies 
a  class  of  operation  mappings.  Chapter  6  first  defines  the 
operations  and  the  operation  mappings.  We  generalize  the  usual 
semantics  for  relational  operations,  and  show  the  usefulness  of 
the  generalization.  For  the  weakest  form  of  these  semantics,  we 
investigate  the  problem  of  determining  when  an  operation  mapping 
gives  a  correct  interpretation  of  the  operation.  Curiously,  we 
find  that  this  operation  mapping  problem  is  undecidahle  because 
of  the  generality  of  the  state  mapping  language.  We  again  omit 
the  set  difference  operator,  and  this  allows  rules  to  be 
formulated  which  are  both  sound  and  complete. 


(1,1)  33 


Chapter  7  uses  the  results  of  Chapter  6  to  study  correctness 
of  operation  mappings  relative  to  two  stronger  kinds  of 
semantics.  We  are  able  to  formulate  a  sound  set  of  rules,  but  it 
is  not  possible  (at  this  time)  to  make  them  complete.  We  find 
that  all  but  the  weakest  restrictions  placed  on  views  require 
guite  strong  restrictions  to  be  placed  on  the  underlying 
database.  We  discuss  how  the  join  operation  makes  it  difficult 
to  formulate  complete  rules,  and  we  point  out  how  the  rules  we 
give  can  be  modified  to  apply  to  other  types  of  correctness. 

The  contributions  of  Part  IV,  like  those  of  Part  III,  are  both 
general  and  specific,  and  both  positive  and  negative.  We 
demonstrate  the  feasibility  of  constructing  operation  mapping 
processors  as  indicated  in  the  ANSI/SPARC  framework.  We  also 
give  specific  rules  for  specific  operations  and  operation 
mappings.  These  rules  can  be  generalized  or  translated  to  apply 
to  other  languages.  Undecidability  results  again  show  that 
mapping  processors  cannot  always  be  built-  We  also  show  that 
heavy  restrictions  are  needed  on  the  underlying  database  and  on 
the  mapping  in  order  to  produce  a  view  with  intermediate 
properties.  This  result  can  be  interpreted  as  showing  that  only 
a  very  limited  class  of  views  can  have  useful  additional  views 
defined  above  them. 

Part  V  provides  the  concluding  remarks  to  the  thesis.  In 
Chapter  8,  we  illustrate  how  the  results  from  Parts  III  and  IV 
can  be  used  to  decide  the  properties  of  mappings  between  other 
kinds  of  data  models.  We  use  a  hierarchical  model  to  illustrate 
this.  First  we  define  the  syntax  of  a  hierarchical  mapping 
language,  and  we  specify  its  semantics  in  terms  of  our  relational 
mapping  language.  Then  we  prove  that  the  hierarchical  mapping 
language  is  well-defined  by  using  the  rules  from  Part  III. 
Although  this  is  only  a  brief  illustration,  this  chapter  is  very 
significant  since  it  shows  the  widespread  area  of  applicability 
of  the  results  of  this  thesis. 

Chapter  9  discusses  future  work  related  to  this  research. 

Then  follows  the  references,  a  glossary  of  terras  and  an  appendix 
containing  details  of  certain  proofs. 


(11,2)  34 


PART  II 


Mathematical  Aspects  of  Mappings 


CHAPTER  2 


Abstract  Machines  and  Mappings 


Chapter  1  has  introduced  the  notion  of  mapping  in  informal 
terms.  Examples  of  the  problems  involved  with  mappings  were  also 
given-  One  of  the  difficulties  in  discussing  or  attacking  these 
problems  —  a  difficulty  not  new  to  the  database  field  —  is  the 
superabundance  of  terminology.  From  ANSI/SPA5C[ ANSI75 ,77  ]  we 
have  internal,  conceptual  and  external  databases; 
materialization,  dematerialization,  transforms,  mapping 
processors,  and  a  host  of  other  concepts.  Paolini  and 
Pelagatti [ PaPe  ]  used  the  terminology  of  many-sorted  algebras  and 
hcmomorphisms.  Dale  and  Dale[ Dale76, 77  ]  talked  about  consistent 
external  schemas,  computable  predicates  and  reference  schemas. 

And  these  three  citations  are  ones  which  agree  relatively  well. 
When  we  also  attempt  to  understand  the  interrelationships  of 
views,  data  translations,  query  translations,  coexistence 
models[ Ni js76  ],  interfaces,  signif ica tions£ Falk  ],  etc.,  etc.,  we 
find  it  very  difficult  to  relate  any  one  body  of  work  to  another. 

Our  goal  in  this  chapter  is  to  formulate  a  useful  set  of 
definitions  for  mappings.  To  be  widely  applicable,  the 
definitions  should  be  as  simple  as  possible.  Thus,  although 
specific  data  models,  e.g.,  Codasyl,  and  specific  mapping 
languages,  e.g.,  that  of  SDDi G[ SDTG  ],  can  be  very  complicated, 
the  basic  concepts  of  mappings  are  not  and  deserve  simple 


(11,2)  35 


definitions.  To  formulate  such  definitions  we  will  use  basic: 
mathematics:  the  langrage  of  sets  and  functions.  These 
definitions  will  look  like  definitions  of  mappings  from  one 
abstract  machine  to  another. 

This  chapter  contains  six  sections.  In  Section  1  we  introduce 
the  basic  concepts  of  data  models.  The  definitions  are  simple, 
but  they  contain  the  essential  ideas.  Then  in  Section  2  we 
define  mapping  models.  The  notion  of  mapping  model  at  the  same 
level  of  abstraction  as  data  model  has  only  briefly  been 
mentioned  before.  Here  we  give  a  precise  definition  of  what  a 
mapping  model  is. 

These  first  two  sections  provide  an  introduction  to  the 
relevant  concepts,  but  to  have  really  useful  definitions,  we  need 
to  take  database  constraints  into  account.  Section  3  first 
refines  our  definition  of  data  model  so  that  the  notion  of 
constraint  is  embodied.  With  this  new  definition  we  can 
distinguish  various  classes  of  structure  and  operation  mappings, 
depending  on  how  the  mapping  preserves  constraints. 

In  Section  4  we  generalize  the  definitions  of  data  model  and 
mapping  model  to  include  the  possibility  of  nonde terministic  data 
model  operations. 

Section  5  compares  our  definitions  with  two  other  formalisms 
from  the  literature.  In  the  area  of  Programming  Languages, 
structured  programs  have  been  described  using  abstract  machines 
and  mappings.  We  compare  abstract  machines  with  our  notion  of 
data  model,  and  we  also  compare  the  two  mapping  concepts.  There 
are  many  points  of  similarity,  but  our  definitions  are  better 
suited  to  database  management.  We  also  compare  a  formalization 
cf  database  mappings  using  abstract  algebra.  We  find  that  there 
are  some  significant  shortcomings  to  the  abstract  algebra 
approach- 

section  6  summarizes  the  chapter. 


(11,2)  36 


2.1.  Abstract  Data  Models 

To  introduce  mapping  definitions,  we  need  first  to  establish  a 
framework  for  talking  about  data  models..  We  could  simply 
restrict  attention  to  the  relational  model,  since  it  is  precisely 
defined  and  well  known.  There  is  a  multitude  of  data  models, 
however,  which  must  explicitly  be  in  the  domain  of  discourse.  We 
must  identify  the  characteristics  all  (or  many)  data  models  have 
in  c ommon . 

First,  data  models  have  schemas.  Each  data  model  determines  a 
language  whose  valid  strings  are  the  schemas. 

Second,  corresponding  to  each  schema  there  is  data  —  the 
actual  contents  of  the  database.  Each  possible  configuration  of 
the  database  is  called  a  state.  A  data  model  determines  sets  of 
states;  for  each  schema  there  is  a  set  of  states  conforming  to 
that  schema. 

The  data  language  is  the  third  important  component  of  a  data 

model. 

As  a  fourth  component  we  will  temporarily  include  replies  to 
data  language  reguests. 

There  seem  to  be  many  ways  to  describe  the  way  in  which  these 
four  basic  components  are  related.  We  first  present  a  simple  but 
useful  structure. 

If  Sch  is  the  set  of  schemas,  we  specify  a  function  St  whose 
domain  is  Sch,  such  that  if  s  6  Sch,  St  (s)  is  the  set  of  states 
conforming  to  schema  s.  The  specification  of  St  generates  or 
recognizes  legal  states,  given  a  legal  schema.  We  may  also  think 
of  St  as  a  set  of  states  parameterized  by  the  set  of  schemas. 

The  set  of  data  language  statements  could  be  described, 
independently  of  anything  else,  as  a  single  set  Q.  However,  the 
meaning  of  a  statement  can  be  a  function  of  the  particular  schema 
in  effect.  For  example,  in  Codasyl,  the  result  of  an  INSERT 
operation  depends  on  Set  Occurrence  Selection  clauses  in  the 
schema.  Therefore,  as  with  states,  we  should  hypothesize  a 
function  Q  whose  domain  is  Sch,  such  that  for  each  s  €  Sch,  Q  (s) 
is  the  set  of  "meaningful”  requests  with  respect  to  the  given 
schema.  In  the  Codasyl  example  mentioned,  we  would  say  that  the 
Set  Occurrence  Selection  clause  is  an  implicit  part  of  the  insert 


(11,2)  37 


operation..  As  with  states,  we  may  think  of  Q  as  a  set  of 
statements  parameterized  by  the  set  of  schemas*. 

What  we  have  called  statements  include  updates,  which  alter 
the  contents  of  the  database,  and  queries  which  return 
information  only..  An  update  may  be  described  as  a  function 
St  (s)->St  (s)  and  a  query  as  a  function  St  (s)->R,  where  E  is  the 
set  of  possible  replies.  In  some  data  languages,  in  particular 
the  record-oriented  ones,  gueries  such  as  the  Codasyl  FIND  NEXT 
may  alter  currency  indicators  which  are  part  of  the  state.  Also, 
updates  can  often  give  replies  such  as  error  codes,  number  of 
records  updated,  or  cost  of  the  update.  Since  the  division 
between  updates  and  queries  is  hazy,  we  can  avoid  meaningless 
distinctions  by  combining  the  functions  of  query  and  update  by 
assuming  that  the  reply  is  a  component  of  the  state.  That  is, 
both  types  of  request  may  be  considered  to  have  functionality 
St  (s)  — >St  (s)  xR.  For  pure  queries,  the  first  component  of  the 
range  is  constant.  There  is  actually  no  need  to  make  the  reply 
component  explicit,  so  for  both  kinds  of  requests,  we  will  simply 
write  St(s)->St(s)  and  assume  that  states  contain  E  as  an 
implicit  component. 

Now  if  q  €  Q(s),  what  kind  of  function  St(s)— >St(s)  is  q?  We 
have  in  mind  the  fact  that  it  is  possible  for  updates  to  violate 
integrity  constraints.  One  approach  would  be  to  assume  that  the 
state  always  ha-s  an  error  component  and  that  an  illegal  request 
changes  the  error  code  but  makes  no  other  change  to  the  state. 
This  would  result  in  requests  being  total  functions.  A  second 
possibility  would  be  to  let  requests  be  partial  functions  such 
that  being  defined  for  a  given  state  was  equivalent  to  not 
producing  any  error.  The  situation  is  similar  to  the  domains  and 
functions  of  denotational  semantics[ Tenn  ].  In  that  field, 
domains  contain  "undefined"  and  "error"  elements,  and  this  allows 
one  to  assume  that  functions  on  domains  are  total.  This 
simplifies  notation  considerably.  Thus  we  will  also  assume  that 
cur  requests  are  total  functions  on  states  and  that  an  implicit 
error  component  takes  care  of  undefined  regions.  We  will  make  a 
more  suitable  refinement  in  Section  2.3.. 


(11,2)  38 


We  summarize  the  above  discussion  with  the  following 
defin it ion: 

Definition  2-1-  An  abstract  data  model  is  a  triple  (Sch, St  ,Q )  , 
where  Sch  is  a  set  whose  elements  are  called  schemas.  St  is  a 
set-valued  function  defined  on  Sch,  and  Q  is  a  set-valued 
function  defined  on  Sch  such  that  for  every  s  e  Sch  and  g  €  Q  (s)  , 
g  is  a  (total)  function  S t  (s) — >St  (s ) .  For  s  6  Sch,  elements  of 
St  (s)  are  called  states .  and  elements  of  Q(s)  are  called 
2£erations. 

□ 

Let  us  give  an  example  of  an  abstract  data  model  by  defining  a 
simple  relational  model  as  a  triple  (Sch, St, Q): 

Sch:  A  schema  s  €  Sch  consists  of  a  set  of  relation 

declarations  of  the  form  'name  (degree)  * . 

St  (s)  :  x  e  St  (s)  if  for  every  relation  declaration  I  (n)  in 
the  given  schema  s,  x  associates  a  finite  set  of  n-tuples 
of  integers  with  E. 

Q(s):  x  6  Q(s)  if  x  has  the  form  *insert(R,t)  '  or 

'delete (R, t)  ' ,  where  E (n)  is  a  declaration  of  s,  ana  t  is 
an  n-tuple  of  integers-  Ey  'insert  (R,t)  '  we  denote  the 
function  which  on  an  argument  x  has  a  result  which  is  like 
x  except  that  the  set  of  n-tuples  associated  with  E 
contains  t-  A  similar  definition  holds  for  the  function 
'delete  (R  ,b) 

It  is  worthwhile  to  compare  Definition  2-1  of  data  model  with 
automata  definitions:  Automata  definitions  generally  do  not  have 
any  component  corresponding  to  schemas-  A  data  model  is  like  a 
parameterized  family  of  automata.  Also  the  sets  of  states  of  a 
data  model  are  generally  infinite.  (Rember,  we  must  distinguish 
between  a  single  state,  which  is  always  finite,  and  the  set  of 
states  conforming  to  a  schema,  which  is  infinite.  This  assumes 
there  is  an  infinite  number  of  basic  data  values,  in  which  case 
an  infinite  set  of  distinct  insertions  will  produce  an  infinite 
number  of  distinct  states.)  By  the  same  token,  the  set  of 
operations  is  infinite. 


(11,2)  39 


2- 2-  Abstract  Mapping  Models 

Given  the  above  formulation  for  the  concept  of  data  model,  we 
next  consider  an  appropriate  definition  of  mapping.  From  the 
diversity  of  applications  of  mappings  presented  in  Chapter  1,  it 
would  seem  that  a  single  definition  is  not  possible: 

In  a  distributed  database,  the  mappings  are  relatively 
symmetric.  Operations  can  originate  at  any  node  and  must  be 
mappable  to  operations  at  any  other  node.  On  the  other  hand,  in 
an  AN SI/SPAFC  database,  there  is  no  symmetry  of  operations:  The 
conceptual  level  is  '‘below"  the  external  level,  and  the  internal 
level  is  "below"  the  conceptual  level. 

Also  in  an  ANSI/SPARC  database,  the  correspondence  of  schemas 
can  be  many- to- many.  One  conceptual  schema  can  be  bound  to  many 
internal  schemas,  and  a  single  internal  schema  can  be  validated 
against  more  than  one  conceptual  schema.  Parameterized  external 
schemas  can  be  bound  in  many  ways  to  the  conceptual  schema.  On 
the  other  hand,  the  schema  binding  relationship  in  a  database 
terminal  environment  is  functional.  That  is,  for  every  schema  in 
the  central  database,  there  is  only  one  corresponding  schema  in 
the  database  terminal. 

These  differences  are  easily  overcome,  however.  The  symmetric 
mappings  an  a  distributed  database  can  be  considered  to  be  pairs 
of  asymmetric  ANSI/SPARC-like  mappings.  The  functional  schema 
correspondences  in  a  database  terminal  can  be  considered  special 
cases  of  the  many-to-many  correspondences. 

It  must  also  be  realized  that  we  should  define  mapping 
concepts  corresponding  to  the  data  model  level  of  abstraction. 
Consider,  for .example,  data  models  Dm  1= (Sch 1 , St  1, Q 1 )  and 
Dm2= (Sch2,St2,Q2)  .  It  is  not  enough  to  define  a  state  mapping 
from  Dml  to  Dm2  to  be  a  function  St  1  (si  )-->S  t2  (s2)  ,  where  si  6 
Schl  and  s2  €  Sch2.  A  proper  definition  will  describe  classes  of 
such  functions:  We  are  interested  not  just  in  the  functions 
St  1 (s 1 )— >St2 (s2)  which  are  mappings  for  some  specific  schemas  si 
and  s2;  we  want  to  talk  about  mappings  involving  all  possible 
schemas  and  operations  in  the  data  models  considered.  So  suppose 
we  have  two  data  models,  Dm1= (Sch 1, Stl ,Q1 )  and  Dm2= (Sch2 , St2, Q2)  , 
and  suppose  we  wish  to  consider  Dm2-views  defined  on  Dml.  First 


(11,2)  4  0 


of  all,  some  objects  are  needed  to  express  the  schema 
association*  For  each  pair  (s1,s2)  of  schemas,  where  si  €  Schl 
and  s2  6  Sch2,  there  is  a  set  of  associations  of  si  with  s2.  An 
association  determines  what  the  static  Dn2-view  is-  If  stl  is  a 
state  in  St(sl) ,  the  static  view  is  a  corresponding  Dm2-state 
conforming  to  s2,  i.e.  ,  a  member  of  St2(s2).  Thus  the 
association  can  be  identified  with  a  function  S  tl  (si )  — >St.2  (s2  )  . 
The  c orre spondence  of  pairs  (s1,s2)  with  the  sets  of  state 
functions  will  be  denoted  Ms  (state  mappings)*  Hence,  given  si  < 
Schl  and  s 2  Sch2,  Ms(s1,s2)  is  a  set  of  state  mappings 

St  1  (s  1)  — >St  2  (s2)  -  An  element  m  €  Ms(s1,s2)  may  be  thought  of  as 
a  view  definition;  for  each  stl  €  Stl  (si)  ,  m(stl)  is  the  Dm2-view 
of  the  Dm  1-state  stl-  If,  for  example,  both  data  models  are  the 
example  relational  model  given  in  the  last  section,  then  we  might 
have  some  (very  simple)  state  mappings  as  follows: 

si:  E<2),  S  ( 2)  ,  T  ( 2) 

s2 :  U (4) 

Ms  (si,  s2)  =  {U  =  E[  2=1  ]S,  U  =  R;2=1]T} 

(R[2=1]S  and  R[2=1]T  are  joins 
on  the  second  domain  of  R  with 
the  first  domain  cf  S  and  T.) 

In  general,  given  any  si  <  Schl,  there  will  be  many  Dm2- 
schemas  s2  such  that  Ks(s1,s2)  is  nonempty,  and,  in  fact, 

Ms(s1,s2)  may  contain  many  state  mappings-  That  is,  for  each  si 
€  Schl,  there  may  be  many  s2  €  Sch2  which  can  be  mapped  to  si, 
and  this  mapping  may  be  done  in  many  ways.  The  case  of  the 
database  terminal  is  special,  however.  Now,  for  every  si  €  Schl, 
there  is  exactly  one  s2  6  Sch2  such  that  Ms(s1,s2)  is  nonempty, 
and  for  this  s2 ,  Ms(s1,s2)  contains  exactly  one  state  mapping. 

We  will  not  need  to  introduce  any  special  notation  for  these  more 
"functional"  mappings. 

Now  suppose  we  have  given  si  €  Schl,  s2  h  Sch2  and  m  € 
Ms(s1,s2).  The  static  view  is  specified  by  m,  but  to  have  a 
complete  dynamic  view,  we  also  need  to  interpret  Dm2-operations 
cn  the  "actually  existing"  Dml-s^ates.  So  for  each  pair  (s1,s2) 
as  above,  there  will  be  specified  a  set  of  possible  operation 
interpretations.  Suppose  that  t  is  such  an  interpretation.  Let 


(11,2)  41 


us  determine  the  appropriate  functionality  for  t-  Since  t  is 
translating  operations  in  Q2(s2),  we  certainly  want  this  set  as 
part  of  the  domain  of  t.  It  must  also  be  assumed  that  the 
translation  depends  on  the  initial  state  to  which  the  translation 
is  applied„  This  means  that  Stl(sl)  should  also  be  a  component 
of  the  domain  of  t.  There  is  no  need  to  also  include  St2  (s2)  in 
the  domain  because  the  Dm2-state  is  determined  by  the  given  state 
mapping  m  applied  to  Stl  (si)  component  of  the  domain.  It  is  not 
sufficient  to  let  Ql(sl)  be  the  range  of  t.  This  is  because  a 
single  Dm  1-operation  may  not  be  enough  to  simulate  a  Dm2- 
operation  (see  the  next  example)  —  several  Dml -opera tion s  may  be 
necessary.  Thus  we  define  the  range  of  t  to  be  Ql(sl)*,  the  set 
of  finite  seguences  of  operations  from  Q1  (si)  .  There  is  no 
reguirement  to  have  any  more  complicated  structure  for  the  range. 
Thus  the  functionality  of  t  should  be  Q2  (s2)  xSt  1  (s  1) ->Q  1  (s'i)  *  . 

The  inter Dretation  is  that  if  t(qrst1)  is  (q1,...,qn)  ,  we  first 
afply  <3*1  to  stl,  then  g2  to  the  resulting  state,  and  so  on,  until 
qn  is  applied.  A  simple  example  of  an  operation  mapping  (one 
that  does  not  depend  on  the  state)  is  the  following: 

si:  K(2),  S  ( 2) 

s2 :  T  (4) 

m:  T  =  R[  2= 1  ]S 

g;  insert (T, (0,1 ,2 ,3) ) 
t  (q)  :  insert  (R,  (0,1));  insert  (S  ,  (2,3)) 

Now  t  cannot  be  just  any  function  with  the  domain  and  range  as 
above.  It  must  interpret  the  Dm2-operations  ’’correct ly"  in  Dml. 
Suppose  that  we  have  an  operation  q  6  Q2(s2)  and  that  the  Dm2- 
view  is  st2-  We  also  assume  that  for  some  stl  6  Stl  (si),  st2  is 
the  view  provided  by  m  of  stl,  i.  e.  ,  st2=m(st1).  If  st2  were  a 
"real"  state  rather  than  a  "virtual"  one,  the  database  would  be 
in  state  q(st2)  after  executing  q.  To  the  user  of  this  view,  it 
is  irrelevant  that  st2  is  not  "real";  he  still  wants  to  see 
g  (st2)  after  invoking  q.  Hence  whatever  Dml-state  which  t  leaves 
the  database  in,  it  should  give  a  view  via  m  which  is  identical 
to  q  ( s 1 2 ) ..  In  mathematical  notation,  this  reads: 

g  ( m  (st  1)  )  =  m  ( t  (q,  stl)  (stl)). 


(11,2)  42 


To  keep  the  formula  simple,  we  have  let  the  sequence 
t  (q,st  1 )  =  (gl ,  -  •  . ,  gn) ,  which  is  an  element  of  Ql(sl)*,  also  stand 
for  the  composition  gn°...°g1  when  applied  to  an  element  of 
St  1  (si) .  Thus  t  (q  ,st 1 )  (stl)  is  the  Dml-state  reached  by  first 
applying  ql  to  stl,  then  g2  to  gl  (stl)  ,  etc.  The  notation 
t(q,st1) (stl)  may  seem  redundant,  since  it  has  two  occurrences  of 
stl.  However,  as  noted  above,  we  cannot  write  t(q)  (stl)  because 
the  translation  may  depend  on  the  state;  and  we  cannot  write 
t(g,st1)  since  otherwise  it  would  not  be  possible  to  talk  about 
the  operation  sequence  q1,.«.,qn  as  distinct  from  its  application 
to  stl.  (As  a  function,  t(g,st1)  is  applicable  to  other  states 
besides  stl;  thus  t (g, stl )  (st 1  * )  is  at  least  mathematically 
meaningful.)  The  above  property  of  t  is  often  represented 
graphically  by  stating  that  the  following  diagram  is  commutative: 


t  (g,  stl) 


We  summarize  the  above  discussion  with  the  following 
definition*. 

Definition  2.2-  Let  Dm  1=  (Sch 1 , St  1 , Ql )  and  Dm2  ( Sch2 , St2 , Q 2)  be 
abstract  data  models.  An  abstract  ma  pp  in  q  Model  M  from  Dml  to 
Dm2  is  a  pair  (Ms, Kg)  such  that  the  following  is  true:  Ms  is  a 
set-valued  function  defined  on  pairs  (s1,s2)  6  Schl  x  Sch2,  where 
if  m  €  Ms(s1,s2),  then  m  is  a  function  St  1  (s  1 )  — >S 1 2  (s 2)  .  Mg  is  a 
set-valued  function  also  defined  on  pairs  (s1,s2)  such  that  if  t 
p  Mg  (s1,s2)  ,  then  t  is  a  function  Q2(s2)  x  Stl  (si)  ->  Ql(sl)*. 

If  t  has  the  property  that  for  each  g  0  Q2(s2)  and  st  €  Stl  (si), 
g  (m  (st)  )  =  m{t(q,st)  (st)  )  ,  then  t  is  correct  with  respect  to  m.. 
Elements  of  Ms(s1,s2)  are  called  state  mappings,  and  elements  of 
Mg  (si ,s2)  are  called  operation  mappings.  The  operation  mapping  t 
is  stajbe  inde pendent  if  it  can  be  described  as  a  function 
Q2  (s2)->Q1  (si)  *. 


0 


(II#  2)  43 


We  will  give  an  example  of  a  mapping  model  defined  on  the  example 
data  model  following  Definition  2.1: 

Ms(s1,s2) :  A  member  m  of  this  set  is  a  set  of  eguations  of  the 

form  R=e,  where  R  is  an  n-ary  relation  in  schema  s2  and  e  is  a 
relational  algebra  expression  of  degree  n  on  relations  in  si. 
For  every  R  in  s2  there  is  exactly  one  such  equation  in  m. 
Mq(s1,s2)  :  A  member  t  of  this  set  is  a  set  of  associations  of 
the  form  op— >cp1 ;. . . ;opn,  where  op  is  an  element  of  Q(s2)  and 
each  opi  (1=1,0..  ,n)  is  an  element  of  Q(sl).  For  every  op  € 
Q(s2)  there  is  exactly  one  such  association  in  t.  (Thus,  t 
has  infinite  cardinality.)  Note  that  t  does  not  depend  on 
St (si).  The  value  t(q)  of  operation  mapping  t  on  s2-operation 
g  is  the  right-hand-side  of  the  association  whose  left-hand- 
side  is  g. 

2. 3.  Adding  the  Notion  of  Constraint 

One  important  aspect  of  data  models  which  was  ignored  in  the 
above  definitions  is  the  concept  of  constraint.  Constraints  are 
objects  in  schemas  which  restrict  the  structure  of  sta  tes  which 
can  belong  to  the  sets  St  (s) .  They  divide  fairly  evenly  into  two 
classes:  inherent  constraints  and  adjunctive  constraints.  To 
give  an  example,  in  a  hierarchical  data  model,  the  constraint 
that  all  database  states  are  forests  of  trees  is  inherent:  it  is 
part  of  the  hierarchical  model.  On  the  other  hand,  the 
specification  of  hierarchical  keys  is  adjunctive:  the 
specification  must  be  explicit  in  the  schema,  and  its  details  may 
vary  from  schema  to  schema.  The  essential  distinction  (which  we 
can  take  as  a  definition)  between  the  two  types  of  constraints  is 
that  inherent  constraints  cannot  be  violated  by  any  data  model 
operation,  whereas  adjunctive  constraints  can  be.  The  kinds  of 
things  which  operations  do  cannot  possibly  change  the  basic 
(inherent)  form  of  states.  Thus  in  a  hierarchical  model, 
inserts,  deletes  or  modifications  can  never  create  a  state  which 
is  not  a  forest  of  trees,  but  these  operations  can  destroy  key 
properties  (if  allowed  to  complete) . 

It  is  not  difficult  to  incorporate  these  concepts  into  the 
definitions  of  an  abstract  data  model.  Given  a  schema  s  €  Sch, 


(11,2)  44 


we  now  postulate  two  sets,  Str(s)  and  St(s).  An  element  of 
Str(s)  is  called  a  structure.  Structures  satisfy  the  inherent 
constraints  of  a  schema  but  not  necessarily  the  adjunctive  ones. 
The  elements  of  St (s)  are  called  states,  and  each  state  is  also  a 
structure,  i.e.,  St (s)  C  Str(s).  States  satisfy  both  the 
inherent  and  the  adjunctive  constraints.  Operations  are 
applicable  to  structures  and  produce  ether  structures  as  results. 
Thus,  if  q  €  Q(s)  ,  q  is  a  function  Str  (s)  — >Str  (s)  .  If  st  is  a 
state,  then  g(st)  may  or  may  not  be  a  state-  However,  if  st  € 

St  (s)  and  q  (st)  €  St  (s)  ,  then  we  say  that  g  is  consistent  with 
st.  Hence,  if  g  is  consistent  with  st,  then  executing  g  will 
cause  no  constraint  violations.  We  summarize  this  discussion 
with  the  following  definition  which  replaces  Definition  2.1. 

Definition  2.3.  An  abstract  data  model  is  a  4-tuple 
(Sch , Str , St ,Q)  such  that  Sch  is  a  set,  and  Str,  St  and  Q  are  set¬ 
valued  functions  defined  on  Sch  satisfying  the  conditions  that 
for  each  s  €  Sch,  S t(s)  C  Str  (s)  ,  and  if  g  6  Q  (s) ,  then  g  is  a 
function  Str  (s) ->Str (s) .  Elements  of  Str  (s)  are  called 
structures.  The  other  components  are  named  as  before.  If  g  € 

C  (s)  and  st  6  St(s),  then  if  also  q  (st)  €  St  (s)  (rather  than  just 
in  Str  (s) ) ,  then  q  is  said  to  be  consistent  with  st. 

D 

The  example  following  Definition  2.1  can  be  modified  to  agree 
with  the  refined  definition.  Thus  we  will  define  a  relational 
data  model  as  a  4-tuple  (Sch, Str , St ,Q) : 

Sch:  s  €  Sch  if  s  consists  of  a  set  of  relation  declarations 

of  the  form  ' name ( degree) ' ,  and  a  set  of  "functional 
dependency"  constraints  of  the  form  •namezX-^Y* .  A 
functional  dependency  R:X— >Ymeans  that  in  every  state,  the 
tuple  set  for  R  defines  a  function  from  the  X  domains  to 
the  Y  domain. 

Str  (s)  :  x  e  Str(s)  if  for  every  relation  declaration  R(n)  in 
s,  x  associates  a  finite  set  of  n-tuples  of  integers  with 

R. 


(II,  2)  45 


St  (s)  :  x  €  St  (s)  if  x  €  Str(s)  and  for  every  functional 
dependency  R:X— >Y  the  tuple  set  associated  with  R  by  x 
defines  a  function  from  the  X  domains  to  the  Y  domain. 

Q(s):  inserts  and  deletes  as  before* 

Using  Definition  2.3  of  abstract  data  models,  we  can  model  the 
primitive  behavior  of  mappings  with  respect  to  constraints. 

The  least  restrictive  form  of  behavior  is  when  the  mapping  is 
not  cognizant  of  constraints  at  all.  In  this  situation,  there 
would  be  structure  mappings  Str  1  (s1)->Str  2  (s2)  (the  analog  of 
state  mappings  in  abstract  data  models) ,  but  these  would  not 
necessarily  map  states  (elements  of  Stl(sl))  to  states  (elements 
of  St2(s2)).  That  is,  given  m  e  Ms(s1,s2),  m  is  a  function 
Str 1 (si) ->Str 2  (s2 ) ,  but  the  best  we  can  write  for  restricting  m 
to  states  is  the  functionality  m: Stl (si)— >Str2 (s2)  (i.e. ,  not 

Stl (si ) ->St2 (s2) ) „  This  means  that  a  consistent  view  may  not  be 
available  via  m  for  every  state  of  the  underlying  database. 
Consistent  with  the  generalization  of  states  to  structures,  the 
set  Mg(s1,s2)  would  contain  translations  with  functionality 
C2  (s2)  xStrl  (si)— >Q1  (si)  *,  and  the  aforementioned  "commutative 
diagram"  eguation  would  read  q(m(str1))  =  m(t(g,str1)  (strl)). 
now  where  strl  €  Strl (si).  These  kinds  of  mappings  have  been 
introduced  mainly  for  completeness;  their  behavior  is  not  very 
desirable.  This  is  because  a  user  of  the  s2  view  could  never  be 
sure  that  all  of  the  constraints  in  s2  were  satisfied* 

The  first  meaningful  restriction  which  can  be  placed  on 
mappings  between  abstract  data  models  is  that  structure  mappings 
take  states  to  states.  In  mathematical  terms,  we  have  that  if  m 
€  Ms(s1,s2),  then,  as  above,  m  is  a  function  S  tr  ( si)  — >Str  2  (s2)  , 
but  also  that  m  restricted  to  Stl  (si)  gives  a  function 
Stl  (si)— >St2 (s2) .  Interpreted  in  database  terms,  this  says  that, 
assuming  that  the  underlying  si-constraints  are  never  allowed  to 
be  violated,  the  contents  of  the  database  will  always  correspond 
to  an  element  of  Stl (si),  and  this  means  that  the  view  defined  by 
m  will  always  be  "consistent" ,  i.e.,  will  be  an  element  of 
St2(s2).  In  other  words,  the  s 2-constraints  are  satisfied.  When 
the  structure  mapping  m  has  this  property,  it  will  be  called 
"consistent"  or  a  "state  mapping"  as  opposed  to  a  "structure 


/II, 2)  46 


mapping1'  on  which  there  are  no  restrictions.  We  give  an  example 
of  a  state  mapping  and  a  structure  mapping  which  is  not  a  state 
mapping : 

sis  s  (3)  ,  T  (3) 

S: 1— >2  ,  T: 1— >3 

s2  :  B  ( 3) 

E:  1— >2 ,  B : 1— >3 

m:  B  =  S»T  (an  intersection) 

This  structure  mapping  is  consistent:  If  stl  is  any  state  of  si, 
the  FDs  1— >2  and  1->3  will  hold  on  the  tuple  sets  for  S  and  T, 
respectively,  and  therefore  both  FDs  will  hold  on  the 
intersection  oi  the  tuple  sets,  i.e. ,  m(stl)  will  be  a  state  of 
schema  s2. 

si:  S  (3)  ,  T  (3) 

S:  1->2 

s2 :  B (3) 

R:  1— >2 

m:  B  =  S+T  (a  union) 

This  structure  mapping  is  not  consistent:  Adding  the  tuples  from 
T  to  those  of  S  will  destroy  the  FD  1— >2.  Hence,  if  st  is  a 
state  of  schema  si,  m  (st)  need  not  be  a  state  of  schema  s2.. 

Now  consider  operation  mappings.  First,  it  is  not  meaningful 
to  place  restrictions  on  operation  mappings  when  the  associated 
structure  mapping  is  not  consistent;  for  if  the  view  is  not 
defined,  it  makes  no  difference  how  well  behaved  an  operation 
mapping  might  be.  So  let  us  assume  that  m  €  Ms(s1,s2)  is 
consistent,  and  let  t  be  a  function  Q2(s2)  x  Stl (si)  — >  Q 1  (si ) *. 
We  want  to  specify  preconditions  UDder  which  t  will  satisfy  the 
commutative  diagram.  The  weakest  precondition  (condition  1)  we 
may  apply  to  t  is  that  it  correctly  interprets  all  (g,st)  pairs 
where  st  €  Stl  (si).  (By  "t  correctly  interprets  pair  (g,st)"  we 
mean  that  g  (m  (st)  )  =  m  (t  (g,  st )  (st) )  ,  i.e„,  that  the  commutative 
diagram  holds.)  We  are  not  interested  in  elements  cf  Stl  (si) 
which  are  not  states  since  applying  ra  to  them  need  not  result  in 
an  s2-state. 


(11,2)  47 


A  stronger  condition  (condition  2)  would  be  that  t  correctly 
interprets  all  (q,st)  pairs  where  q  is  consistent  with  m(st)  and 
st  €  St 1 (si) .  We  have  restricted  the  cases  when  t  must  correctly 
interpret  g  to  those  cases  where  g  does  not  violate  any  s2- 
constrainto  Since  operations  on  views  should  not  violate  view 
constraints,  this  is  a  useful  criterion. 

The  third  condition  (condition  3)  is  that  t  correctly 
interprets  all  (q,st)  pairs  where  st  -  Stl(sl)  and  t(q,st)  (st )  e 
Stl (si) .  This  condition  means  that  we  do  not  care  about  correct 
interpretation  when  the  s 1-constraints  are  violated  —  in  real 
systems  such  operations  are  generally  not  allowed  to  complete. 

These  three  conditions  are  not  independent.  Each  says  that  t 
must  satisfy  the  commutative  diagram  for  a  certain  set  of  (g,st)~ 
pairs-.  The  larger  the  set  of  these  pairs,  the  more  well-behaved 
t  will  be.  The  sets  of  pairs  and  their  subset  relationships  are 
the  following: 

Cl  =  { (g,st)  :  st  €  St  1  (s  1 ) } 

C2  =  {(g,st)  :  st  €  Stl  (si)  «  g  (ms  (st)  )  e  St2(s2)} 

C3  =  (  (q,st)  :  st  6  Stl  (si)  •  t(g,st)(st)  C  Stl(sl)} 

C3  C  C2  C  Cl 

Thus  condition  3  is  the  weakest;  condition  2  is  stronger  than 
condition  3,  and  condition  1  is  stronger  than  both  condition  2 
and  condition  3<,  A  mapping  t  which  satisfies  the  commutative 
diagram  for  the  set  Cl  will  also  satisfy  it  for  C2,  and  if  it 
satisfies  the  commutative  diagram  for  C2  then  it  will  satisfy  it 
for  C 3.  The  following  are  some  examples  of  these  three  types  of 
operation  mappings: 

si :  S  (2)  ,  T  (2) 

s2  :  E  ( 4) 

m:  F  -  S[2=1]T  (a  join  on  the  second  domain  of  S  with 

the  first  domain  of  T) 

t:  insert  (E,  (w,x,  y,z)  )  — >  insert  (S,  (w,x) )  jinsert  (T,  (y  ,z) ) 

delete (B,  (w, x, y, z) )  — >  delete (S,  (w, x) ) 

Since  there  are  no  constraints,  m  is  trivially  consistent. 
However,  t  is  not  even  type  3;  in  fact,  the  commutative  diagram 
will  fail  in  general  for  t.  To  see  this,  consider  the  si-state 


(II,  2)  48 

stl  =  (S=0;T=  {(1,  2)  })  .  The  corresponding  s2-state  is  m(stl)  = 
(E=0)  .  Let  g  be  insert  (R ,  (0 , 1 , 1 , 3)  ) .  Then  g(m(st1))  =  q  (R=0)  = 
(F={  (0,1 ,  1 ,3)  }  )  ,  while  t  (g,st  1)  (stl)  = 
insert  (S,  (0,1))  ;insert(T,  (1,3))  (3=rf  ;T=  £  (1  ,2)  }  )  = 

(S=  { (0,  1)  }  ;T=  {  ( 1,  2)  ,  ( 1, 3)  )  )  .  Applying  m  to  this  state,  we  get 
m (t (g,st1)  (stl) )  =  (R=  {  (0 ,1 ,1 ,2)  ,  (0 ,1,  1,3)  })  which  does  not  egual 
q  (m  (st  1)  )  -  Hence  the  commutative  diagram  fails.  There  are 
similar  problems  in  translating  deletes. 

Now  consider: 

si  :  S  (2)  ,  T  (2) 

S:  1—>2,  T: 1— > 2 
s2 :  F.  (4) 

m:  B  =  S[ 2=1  ]T 

t:  insert (R, (w,x, y, z)  )  ->  insert  (S,  (w ,x) )  ;insert  (T,  (y ,z) ) 

delete  (R,  (w, x, y, z) )  ->  delete (S,  (w, x) ) 

This  is  like  the  previous  example  except  that  S  and  T  each  have 
an  FD  declared  in  their  schema.  This  causes  t  to  have  the  type  3 
property  which  says  that  as  long  as  the  beginning  and  final  si- 
structures  satisfy  the  schema  constraints,  t  will  satisfy  the 
commutative  diagram.  The  diagram  will  still  fail  for 
stl  =  (S=0;T=  {  (1,  2)  }  )  and  q  =  insert  (R ,  (0  #  1 , 1  #  3)  )  as  above,  but 
new  this  case  does  not  count  because  the  final  si-structure, 
(S={(0,1)  ; T=  {(1,2),  (1,3)}),  does  not  satisfy  the  si-const raints 
(namely,  T :  1 — > 2)  .  Thus  the  type  3  condition  causes  all  starting 
states  of  si  which  might  cause  a  ‘'connection  trap"  in  the  final 
state  to  be  eliminated  from  consideration  when  checking  the 
commutative  diagram. 

Next  consider: 

si:  S  ( 2)  ,  T  (2) 

£: 1— >2  ,  T: 1— >2 
S[  1]  C  T[  1  ],  T[  1  ]  C  S[  1] 
s2 :  R  ( 4) 

R:1->2,  F. :  3— >4 

m:  F  =  S[ 1  =  1  ]T  (a  join  on  the  first  domains  of  S  and  T) 

t:  insert  (F,  (w,x, y, z) )  — >  insert  (S,  (w,x) )  jinsert (T, (y ,z) ) 

delete  (F,  (w, x, y, z) )  ->  delete (S, (w, x) )  ;delete (T, (y  ,z) ) 


(11,2)  49 


The  structure  mapping  is  consistent  since  H: 1— >2  follows  from 
S: 1— >2,  and  R:3— >4  follows  from  T : 1— >2.  We  can  show  that  in  this 
example,  the  operation  mapping  t  is  type  2-  The  type  2  condition 
states  that  if  the  beginning  si-structure  satisfies  the  si- 
const  rain ts  (and,  consequently,  the  s 2-constraints  are 
satisfied),  and  if,  according  to  the  definition  of  the  s2- 
cperation  g,  the  s2-constrain ts  should  not  be  violated,  then  the 
operation  mapping  t  correctly  interprets  g  on  stl.  In  this 
example,  the  s2-const raints  are  exactly  equivalent  to  the  sl- 
ccnstraints:  If  the  s2-constrain ts  are  satisfied,  then  the  si- 

constraints  are  also  satisfied,  and  therefore  no  "connection 
trap"  wiJ 3  occur.  Note  that  the  subset  constraints  in  schema  si 
are  necessary.  They  have  the  effect  that  the  first  domains  of  S 
and  T  are  always  equal  so  that  the  join  will  not  leave  out  any 
tuples  from  the  s2-view.  This  is  essential  for  the  s2- 
constraints  to  imply  those  of  si. 

The  next  example  is* 

si :  S  (2)  ,  T  ( 2) 

s2:  R  ( 2) 

m:  R  =  S+T 

t:  insert  (R ,(  x,  y)  )  — >  insert  (S,  (x,  y) ) 

delete  (R,  (x, y)  )  ->  delete  (S,  (x,  y) )  ;delete  (T,  (x,  y)  ) 

This  is  an  example  of  a  type  1  operation  mapping.  The 
commutative  diagram  will  always  be  satisfied. 

We  have  defined  three  different  properties  which  a  translation 
t  may  have  with  respect  to  the  correct  interpretation  of  Dm2- 
cperations.  We  can  also  ask  what  t  does  to  the  constraints: 

Suppose  that  q  is  consistent  with  respect  to  m  (stl) ,  i.e., 
that  g  (ms  (stl))  is  also  in  St2(s2).  How  does  the  operation 
mapping  t  affect  the  si-state?  It  might  leave  all  si-constraints 
satisfied;  that  is,  we  might  have  t(g,st1)  (stl)  €  Stl  (si)  ,  or  t 
may  cause  some  constraint  violations;  that  is,  t  (q, stl)  (stl)  6 
Str(sl)  but  t  ( q,  stl)  (stl)  $  Stl  (si)..  If  the  former  (desirable) 
condition  holds  for  every  stl  €  Stl (si)  and  every  q  €  Q2 (s2) 
consistent  with  the  view  m(stl),  then  we  say  that  t  is 
consistent. 


(11,2)  50 


An  example  is  the  following: 

si:  S  (2)  ,  T  (2) 

S: 1— >2  ,  T: 1— >2 
S[  1]  C  T[  1  ]r  T[  1  ]  C  S[  1] 
s2 :  R (4) 

B:1->2,  R : 3— >4 
m:  R=S[ 1  =  1  ]T 

t :  insert (R,  (w,x, y, z) )->insert  (S,  (w,x) )  ; insert (T,  (y,z) ) 
delete(R, (w,x„y,z) )->delete(S,  (w,x))  ;delete(T,  (y,z)) 

The  operation  mapping  t  is  consistent  because  the  view  will 
always  contain  every  tuple  in  S  and  T.  Thus  if  the  insertion  of 
(w,x,y,z)  into  the  view  causes  no  constraint  violation,  then  the 
insertion  of  (w«x)  in  S  and  (y,z)  into  T  also  will  not  cause  any 
constraint  violation. 

Another  (weaker)  form  of  behavior  is  possible  for  t  with 
respect  to  constraint  violations.  It  may  happen  that 
t  (q, stl)  (stl)  6  Strl (si ) -St  1  (si ) ,  i.e.,  that  the  translation 
t(q,st1)  violates  an  si-constraint  even  though  q  is  consistent. 
However,  there  may  be  some  sequence  g1,...,qn  of  Dm 1-oper atio ns 
such  that  qn° » . . °g1°t (q, st 1)  (stl)  <  Stl(s-I)  and  saich  that  the 
image  under  m  of  this  new  state  has  not  changed,  i.e.,  it  still 
equals  g  (ms  (stl)).  Intuitively,  this  says  that  although  t  ma y 
cause  some  si-constraint  violations,  these  can  be  corrected 
without  affecting  the  Dm2-view.  The  operations  q1,...,gn  can  be 
thought  of  as  triggers,  and  we  can  call  such  mappings  tr- 
consi stent. 

An  example  follows: 

si:  S  (  2)  ,  T  (2)  ,  U  (  2) 

U[  1  3  C  T[  2  ] 
s2  :  E  (  4) 
m:  E  =  S{  2=1  ]T 

t:  insert  (R, (w,x, y, z) )  ->  insert  (S,  (w, x) )  ;insert (T, (y,z) ) 

delete  (B,  (w,  x,  y,  z)  )  — >  delete  (S,  (w,x) )  ;delete  (T,  (y  ,z) ) 

The  operation  mapping  t  is  tr-consistent.  The  subset  constraint 
in  si  can  be  violated  by  translations  of  deletes  on  E.  If,  say, 
(0,1)  is  in  0  and  (2,0)  is  in  T  and  is  the  only  tuple  in  T  with  a 


(11,2)  51 


•2'  in  column  1,  then  the  s2-operation  delete  (R, (3,2, 2,0) )  will 
cause  a  constraint  violation  in  si-  But  if  we  add  to  the 
translation,  which  is  delete  (S,  (3 ,2) )  ;delete  (T,  (2 ,0))  ,  the 
operation  delete (0, (0 ,1) ) ,  then  the  violation  will  be  corrected 
without  changing  the  s2-view. 

The  three  conditions  specified  by  Cl,  C2  and  C3  are 
independent  from  the  two  notions  of  consistency:  The  former 
specify  when  the  commutative  d-iagram  holds;  the  latter  specif  y 
when  legal  view  operations  will  be  translated  to  legal  base 
operations. 

These  mappings  are  formally  defined  as  follows,  and  the 
definition  supercedes  the  preliminary  one: 

Definition  2-4.  Let  Dml  =  (Schl,  Strl ,St1  ,Q1)  and  Dm2  = 

(Sch2 ,Str2,St2,Q2)  be  abstract  data  models.  An  abstract  mapping 
model  from  Dml  to  Dm2  is  a  pair  (Ms, Mg)  such  that  Ms  and  Mg  are  a 
set-valued  functions  defined  on  Schl  x  Sch2,  If  m  €  Ms  (si ,s2)  , 
then  m  is  a  function  Strl  (si ) ->Str2  (s2)  .  If  t  6  Mg(s1,s2),  then 

t  is  a  function  Q2  (s2)  x  Str  1  (s  1 )  — >Q1  (s  1)  *.  Let  si  €  Schl,  s2€ 

Sch2,  m  6  Ms(s1,s2)  and  t  €  Kg(s1,s2).  Then  m  is  consistent  if 
st  €  Stl (si)  implies  m(st)  €  St2(s2),  i.e.,  if  m  restricts  to  a 
function  St  1  (si)  — >St2  (s2)  .  Given  consistent  m,  the  operation 
mapping  t  is  of  t ype  i  (i=1,2,3)  with  r espect  to  m  if  the 
corresponding  clause  below  holds,  where  st  e  Strl  (si)  and  g  € 

Q2  (s2)  : 

[1]  if  st  €  Stl  (si),  then  g(m(st))  =  m(t(g,st)  (st)  )  . 

[2]  if  st  €  Stl  (si)  and  g(m(st))  6  St2(s2),  then  g  (m  (st)  )  = 

m (t (g, st)  (st) ) . 

[3]  if  st  6  Stl(sl)  and  t(g,st)(st)  €  Stl(sl),  then  q(m(st))  = 
m  (t  (q,  st)  (st)  )  , 

The  operation  mapping  t  is  consistent  with  respect  to  m  if 
t(g,st)  (st)  €  Stl  (si)  whenever  st  e  Stl  (si)  and  g  is  consistent 
with  m(st).  The  operation  mapping  t  is  tr- con sis tent  with 
Suspect  to  m  ('tr1  for  'trigger')  if  whenever  st  6  Stl(sl)  and  q 
is  consistent  with  m(st),  there  are  Dml -operations  q1,..,gn  such 
that  gn°. . . °g 1°t (g, st) (st)  €  Stl(sl)  and  m (gn° . . - °g1° t (g , st) ( st) ) 
=  m  (t  (g,  st)  (st) )  . 


0 


(11,2)  52 


There  is  an  interesting  interpretation  which  can  he  applied  to 
the  concepts  of  consistent  state  mapping  and  consistent  operation 
mapping.  If  a  consistent  state  mapping  m  exists  in  Hs(s1,s2)  , 
then  we  can  say  that  the  constraints  of  si  imply  those  of  s2. 

For  whenever  a  structure  strl  €  Strl(sl)  satisfies  the  si- 
constraints  (is  an  element  of  Stl(sl)),  the  image  m(strl) 
satisfies  the  s2-constraints  (is  an  element  of  St2(s2))~ 
Conversely,  if  a  consistent  operation  mapping  t  exist?  in 
Mq(s1,s2),  where  m  is  consistent,  then  we  can  say  that  the 
constraints  of  s2  imply  those  of  si  (or  at  least  that  they  imply 
the  relevant  ones).  For  whenever  g(m(st1))  satisfies  the  s2- 
constraints,  t(g,st1)  (stl)  satisifies  the  si-constraints. 

2.  4.  Non deterministic  Operations 

In  the  two  kinds  of  models  defined  so  far,  the  operations  were 
functions.  This  means  that,  if  on  two  occasions  a  user  lists 
his/her  view,  performs  an  update,  and  then  lists  the  view  again, 
if  the  before-states  are  the  same,  then  the  after-states  will 
also  be  the  same.  We  now  extend  the  definitions  to  cases  where 
the  operations  are  nondeterminist ic ,  i.e„,  where  the  after-states 
corresponding  to  same  before-states  may  vary  from  run  to  run. 

First  we  justify  this  extension. 

A  "connection  trap"  in  the  relational  model  is  a  situation  in 
which  a  relation  has  been  projected  and  the  projections  have  been 
rejoined,  but  in  which  the  join  contains  extraneous  tuples,  i.e., 
tuples  not  in  the  original  relation..  The  term  can  also  be 
applied  to  views  (see  below).  The  "connection  trap"  in 
relational  systems  is  traditionally  viewed  as  something 
undesirable  and  something  to  be  avoided.  We  consider  this 
judgement  too  restricting  in  general.  (An  example  of  a  useful 
connection  trap  is  given  in  Chapter  4).  One  reason  for  the 
negative  connotation  of  "connection  traps"  is  that  they  cause 
non-determinism  in  the  data  model.  Consider,  for  example,  the 
following  states  of  a  relational  database  where  T  =  F.[2=1]S: 


(11,2)  53 


(a) 

E 

S 

T 

/a 

0 

J* 

(b) 

E 

s 

T 

1  2 

2  3 

1 

2  2  3 

(c) 

E 

S 

T 

0  2 

0 

0 

(d) 

E 

S 

T 

0  2 

2  3 

0 

2  2  3 

1  2 

1 

2  2  3 

In  both 

(a) 

and 

(c)  , 

the  state  of  the  View  T  is  the  same,  yet 

insertion  of 

the 

tuple  (1,2, 2, 3)  into  T  (by  inserting  (1,2)  into 

E  and  (2 

r  3  ) 

into 

S) 

produced  different  results  ((b)  and  (d) , 

respectively)  . 

Nondetermijiism  of  data  model  operations  is  a  generalization  of 
the  connection  trap.  When  mappings  mask  part  of  the  database 
from  the  Dm2-view,  it  may  be  necessary  to  allow  operations  on  the 
view  to  be  nondeterminist ic.  This  is  because  the  invisible  parts 
of  the  underlying  database  may  change,  causing  different 
interactions  with  the  same  view  operations.  For  example, 
database  procedures  may  not  be  visible,  but  they  may  be  invoked 
by  view  operations.  It  is  therefore  justified  to  generalize  the 
definitions  of  data  model  and  mapping  model  to  include 
nondeterminism.  To  do  this,  we  replace  operation  functions 
Str (s) ->S tr (s)  by  binary  relations  which  are  subsets  of 
Str(s)  x  Str (s) .  The  functionality  of  mappings  remains  the  same, 
but  the  commutative  diagram  must  be  generalized.  To  see  what  the 
appropriate  change  should  be,  suppose  si  €  Schl,  s2  €  Sch2,  m  6 
Ms  (si ,  s2)  ,  t  €  3q(s1,s2)  and  that  st  €  Stl(sl’).  Formally, 
t  (q, s t)  is  a  tuple  of  operations  (g1,„-.,gn).  When  the 
commutative  diagram  for  deterministic  data  models  was  given,  we 
viewed  this  tuple  as  a  composition  of  functions  qn°...°q1  in 
crder  to  keep  the  notation  simple..  It  is  now  natural  to  view 
(g1,...,gn)  as  a  composition  q1°...°gn  of  binary  relations.  (The 
traditional  notation  for  functional  and  for  relational 
compositions  is  inconsistent.  For  binary  relations,  r°s  is 
defined  by  (a,b)  e  r°s  iff  for  some  c,  (a,c)  €  E  and  (c,b)  e  s.) 
Now,  rather  than  requiring,  as  in  the  deterministic  case,  that 
the  "expected”  result,  which  is  q  (m  (st)  )  ,  equals  the  "actual" 


(11,2)  54 


result,  vhi/oh  is  m (t (q, st)  (st ) )  ,  we  say  that  every  possible 
"actual”  result  is  also  an  allowable  "expected"  result.  More 
precisely,  the  set  [st]t(g,st)  represents  all  possible  Dml-states 
after  executing  the  translation  t(g,st).  (If  r  is  a  binary 

relation,  and  a  is  in  the  domain  of  r,  b  6  [a]r  if  and  only  if 
(a,b)  e  r.)  Applying  the  state  mapping,  we  get  m  ([  st  ] t  (q,  st) )  as 
the  set  of  all  possible  Dm2-views  from  executing  the  operation 
napping,  where  we  are  applying  m  as  a  set  function.  The  s2-user 
expects  a  result  which  is  a  member  of  [m(st)  ]g,  and  therefore, 
the  condition  m  ([  st  ]t  (g , st)  )  C  [  m  (st)  ]q  is  reguired.  This 
formula  is  a  direct  generalization  of  the  commutative  diagram. 
According  to  this  discussion,  the  following  definitions  are  made: 

Definition  2.5.  A  non deterministic  data  model  is  a  4 -tuple 
(Sch , Str , St ,Q)  such  that  Sch  is  a  set,  and  Str,  St  and  Q  are  set¬ 
valued  functions  on  Sch  where  for  each  s  6  Sch,  St(s)  C  Str(s) 
and  if  q  6  C(s),  then  g  is  a  binary  relation  over 
S tr  (s)  x  Str  (s) o 

0 

Definition  2.6.  A  nor.determinist. ic  mapping  model  Mm  from  Dml  to 
Dm2,  which  are  nondeterminist ic  data  models,  is  a  pair  such  that 
Ms  and  Mg  are  set-valued  functions  defined  on  Schl  x  Sch2.  If  m 
€  Ms(s1,s2),  then  m  is  a  function  Strl(sl)  ->  Str2(s2).  If  t  € 
Mg(s1,s2),  then  t  is  a  function  Q2(s2)  x  Strl(sl)  ->  Ql'sl)*.  A 
mapping  m  6  Ms(s1,s2)  is  consistent  if  m  restricts  to  a  function 
St  1  (s  1)  — >:5t2  (s2)  .  Given  cpnsistent  m,  the  operation  mapping  t  is 
type  i  (1=1, 2, 3)  with  respect  to  m  if  the  corresponding  condition 
below  holds: 

[  1  ]  if  st  6  Stl  (si)  ,  then  m  ([  st  ]t  (g , st)  )  C  [  m  (st)  ]q. 

[2]  if  st  e  Stl(sl)  and  [m(st)]q  C  St2(s2), 
then  m  ([  st ]t (g,  st)  )  C  [m(st)]g- 

[3]  if  st  6  S 1 1  ( s  1)  and  [st]t(g,st)  C  Stl  (si), 
then  m  ([  st  ]t  (q,  st) )  C  [m(3t)]q. 


0 


(II,  2)  55 


2.  5.  Other  Formalisms 

In  this  section  we  will  compare  our  definitions  with  two  other 
formalisms.  The  first  is  based  on  abstract  machines,  and  the 
second  on  many-sorted  algebras. 

Robinson  and  Levitt[RoLe]  used  an  abstract  machine  approach 
for  proving  properties  of  hierarchically  structured  programs. 
Although  this  work  mainly  deals  with  verification  technigues,  it 
can  be  applied  to  database  mappings.  Let  us  first  describe  some 
of  the  concepts  involved. 

An  abstract  machine  consists  of  a  set  of  v-f unctions  (value 
functions)  and  a  set  of  o-f unctions  (operation  functions)  ,  which 
characterize,  respectively,  the  machine’s  internal  state  and  its 
transformation  rules.  We  give  below  an  example  of  a  register 
module  described  as  an  abstract  machine: 

integer  v-f unction:  LENGTH 

comment:  returns  the  number  of  occupied  positions 
in  the  register 
initial  value:  LENGTH=0 
exceptions:  none 

integer  v-f unction:  CHAR  (integer  i) 

comment:  returns  the  value  of  the  i-th  ele-ment 
of  the  register 

initial  value:  (^i)  CHAR (i ) =undef ined 
exceptions : 

I_OUT_OF_BOONDS :  i<0  or  i>LENGTH 

o-function:  INSERT (integer  i,j) 

comment:  inserts  the  value  j  after  position  i, 

moving  subsequent  values  one  position  higher 
exceptions : 

I_OUT_OF_BOUNDS:  i<0  or  i>LENG TH 
J_OUT_BOUNDS:  j<0  or  j>255 
TOO_LGNG:  LENGTHS  1000 
effects:  LENGTH  =  ’LENGTH' +1 

(¥k)  CHAR  (k)  =  if  k<i  then  'CHAR'(k) 

else  if  k=i*1  then  j 
else  'CHAR'  (k-1) 

o-function:  DELETE (integer  i) 


(11,2)  56 


comment:  deletes  the  i-th  element  of  the  register, 

moving  the  subsequent  values  to  fill  in  the  gap 
exceptions : 

I_OUT_BOONDS:  i<0  or  i>LENGTH 
effects: 

LENGTH  =  *  LENGTH '- 1 

(¥k )  CHAR(k)  =  if  k<i  then  'CHAR'(k) 

else  'CHAR1  (k+ 1 ) 

The  exception  conditions  represent  calls  which  cannot  be 
handled  by  the  module.  The  effects  section  in  an  o-f unction 
definition  describes  what  the  o-function  does:  v-function  values 
before  the  call  are  enclosed  by  single  quotes,  and  the  values 
after  the  call  appear  without  quotes. 

An  abstract  program  is  a  program  in  which  o-f unction  calls 
appear  as  statements  and  v-function  calls  appear  in  expressions 
(of  the  appropriate  types).  A  hierarch icaU  y  structured  prog  ram 
is  a  sequence  (MO , PO , f 0)  , . .  „  ,  (Mn- 1 , Pn- 1 ,f n-1 )  ,  { Mn , Pn) .  Each  Hi 
is  a  machine;  each  Pi  is  a  set  of  abstract  programs  which  run  on 
Mi  and  which  implement  machine  Mi+1,  and  each  fi  is  a  mapping 
function  from  the  state  space  of  Mn  to  that  of  Mn+1„  Since  v- 
functions  characterize  states,  mapping  functions  can  be  written 
as  a  seguence  of  definitions,  one  for  each  v-function  V  of  Mi+1: 

V(a1,...,an)  =  exp  (al „  ,an)  , 

\ 

where  'exp*  involves  the  v-functicns  of  Mi.  As  an  example,  we 
can  define  the  previous  register  module  on  top  of  the  following 
array  module: 

in  teger  v-function:  ACCESS (integer  i) 

comment:  returns  value  of  i-th  element  in  array 
initial  value:  (¥ k)  ACCESS  (k)  =  if  0<i<100  then  0 

undefined 

exceptions:  ARBOUNDS:  i<0  or  i>1000 

effects:  (¥k)  ACCESS  (k)  =  if  k^i  then  k  else  'ACCESS' (k) 

To  do  this,  a  mapping  function  is  specified: 

LENGTH  =  L  (a  program  variable) 

CHAR(i)  =  if  1<i<L  then  ACCESS  (i)  else  undefined 


(11,2)  57 


An  abstract  program  is  also  specified  for  each  o-function  of  the 
register  module: 

procedure  INSERT (integer  i,j) 

L  :=  1*1; 
n  :=  L ; 

1:  if  n=i+ 1  then  CHANGE (n  ,  j) 

else  CHANGE  (n, ACCESS  (n-1)  )  ; 
n  :=  n-1; 
goto  1  fi 

end 

pr oced ure  DELETE  (integer  i) 


end 

Now  let  us  compare  these  concepts  with  the  ones  we  have 
defined  in  the  preceeding  sections  of  this  chapter. 

Abstract  machines  correspond  to  data  models  (with  a  given 
schema)..  As  we  have  mentioned  earlier,  making  the  set  of  all 
schemas  a  component  of  a  data  model  means  that  a  data  model  is 
like  a  family  of  abstract  machines.  We  have  suppressed  the 
guery-like  statements  in  our  data  model  definition,  but  it  is 
gueries  which  correspond  to  v-f unctions.  We  cannot,  hdwever, 
easily  identify  database  states  with  the  set  of  results  given  by 
gueries  as  abstract  machine  states  can  be  identified  with  the  set 
of  v-function  values.  This  will  be  shown  by  an  example  shortly. 
What  Eobinson  and  Levitt  call  mapping  functions  correspond  to  our 
structure  mappings.  Because  of  the  different  interpretations  of 
the  formalisms,  the  restrictions  on  these  functions  are 
different--  Eobinson  and  Levitt  specify  that  mapping  functions 
need  not  be  total,  but  must  he  onto.  Our  structure  mappings  are 
total,  but  need  not  be  onto.  The  difference  is  that  an 
implementation  of  an  abstract  program  must  be  able  to  produce 
results  for  every  top  level  input  (the  onto  property) ,  but  not 
every  bottom  level  state  need  correspond  to  a  higher  level  state 
since  it  is  the  higher  levels  which  "use"  the  lower  levels.  On 


(II ..  2)  58 


the  other  hand,  every  state  of  a  conceptual  database  must  be 
visible  as  a  state  ir,  an  external  view  (the  total  property)  since 
otherwise  the  view  would  be  undefined  or  some  view  constraint 
would  be  violated..  On  the  other  hand,  external  schemas  describe 
what  is  possible,  not  what  must  be,  hence  the  structure  mapping 
need  not  be  onto-. 

Abstract  programs  correspond  to  our  definition  of  operation 
mapping..  Given  an  initial  abstract  machine  state,  an  abstract 
program  determines  a  sequence  of  state  transitions  according  to 
the  execution  path  followed  in  the  program.  Thus  the 
functionality  Q1 (s2)  x  St  1  (s 1 ) — >Q 1 (si ) *,  which  can  also  be 
written  Q2  (s2)->[  Stl  (si  )->Q1  (si)  *  ],  is  just  an  "outside"  view  of 
an  abstract  program. 

At  a  more  detailed  level,  there  are  problems  with  using 
abstract  machines  to  describe  database  concepts.  To  illustrate, 
let  us  describe  the  hierarchical  portion  of  the  EDBS  system[Lols] 
as  an  abstract  machine.  EDBS  is  an  instructional  database 
management  system  which  has  relational,  hierarchical  and  network 
interfaces-  The  EDBS  hierachical  system  provides  retrieval  and 
modification  commands  patterned  after  IBM's  IMS: 

GET  UNIQUE  (GU) :  used  for  random  retrieval  in  a  database 
GET  NEXT  (GN)  :  used  for  sequential  (preorder)  retrieval 
GET  NEXT  WITHIN  PARENT  (GNP) :  used  for  sequential  (preorder) 
retrieval  in  a  database  subtree 

GET  HOLD  (GHU,  GHN,  GHNP) :  performs  the  analogous  GET  operation 
and  locks  the  segment  retrieved 
INSERT:  used  to  add  new  segments  to  a  database 
DELETE:  used  to  remove  segments  from  a  database 
REPLACE:  used  to  change  field  values  of  segments  in  a  database 
READ:  returns  the  contents  of  the  buffer 
WRITE,  REWRITE:  modifies  the  buffer 

TYPE:  returns  the  name  of  the  segment  type  most  recently 

retrieved 

STATUS:  returns  the  condition  code 

The  components  of  the  state  of  this  abstract  machine  are:  the 
database  proper,  the  buffer,  the  value  of  TYPE  and  the  value  of 
STATUS.  The  o-functions  are  GU ,  GN,  GNP,  GHU,  CHbP,  INSERT, 


(11,2)  59 


DELETE,  REPLACE,  WRITE  and  REWRITE.  The  v-functions  are  READ, 
TYPE  and  STATUS.  Note  that  since  the  v-functions  do  not 
represent  the  value  of  the  first  component  of  the  state  (the 
contents  of  the  database)  ,  we  cannot  identify  the  state  space 
with  the  set  of  v-functions.  In  ether  words,  two  different 
database  states  may  give  ths  same  values  for  READ,  TYPE  and 
STATUS. 

Consider  also  the  parameters  of  the  v-  and  o-f unctions  of  this 
EDBS  abstract  machine.  While  Robinson  and  Levitt's  formalism 
forces  a  definite  choice  of  parameters,  the  nature  of  the  EDBS 
functions  (and  of  data  language  statements  in  general)  is  not  so 
clear.  The  following  is  a  typical  EDBS  command: 

GU  'PRESIDENT  WHERE  (N AME=  JOHN)  AND  (ADDRE S£=MONTREAL) 

As  an  API  function,  GU  has  one  parameter  which  i,s  a  character 
string.  When  we  put  more  "semantics"  into  our  choice,  we  would 
say  that  the  parameters  are  the  segment  type  (here,  PRESIDENT) 
and  the  where-cla use .  A  slightly  different  situation  would  lead 
to  still  another  choice  of  parameters: 

X  <-  'JOHN' 

Y  <-  'MONTREAL* 

GU  'PRESIDENT  WHERE  ( N A M E= '  ,  X  ,  •)  AND  ( ADDRESS=  '  ,  Y  ,  ')' 

(Recall  that  commas  are  concatenation  operators  in  API)  .  Here  we 
would  tend  to  say  that  the  parameters  of  the  GU  command  are  X  and 
Y.  In  other  models,  such  as  SQL[ ABCE  ],  the  identification  of 
parameters  is  even  more  uncertain  since  whole  statements  can  be 
nested. 

At  this  point,  then,  it  seems  best  to  suppress  the  parameters 
cf  data  model  operations  altogether,  which  is  what  we  have  done 
in  our  formal  definition. 

/ 

Let  us  mention  one  more  problem  with  the  abstract  machine 
approach.  Schmid t[ Schm  ]  has  defined  data  of  type  relation  for 
the  language  Pascal  along  with  some  operations  for  this  type. 

One  of  the  constructs  is  a  for- stat era-ent  of  the  form: 

f oreach  <rec>  in  <rel>  do  <stmt> 


(11,2)  60 


It  is  approximately  equivalent  to  the  loop: 

low  (<roI>)  ; 

whi_le  net  aor(<rel>)  do 
begin  <stmt>; 

next  (<rel>) 

end 

Here  ‘low'  sets  a  pointer  to  the  first  tuple  in  <rel>;  'aor' 
returns  true  if  and  only  if  the  pointer  has  been  moved  past  the 
last  tuple  in  <rel>,  and  'next'  moves  the  pointer  to  the  next 
tuple  in  <rel>.  (The  two  fragments  are  not  exactly  equivalent 
since  the  retrieval  order  in  the  first  case  is  unspecified,  while 
it  is  specified  in  the  second.)  While  the  second  case  is  clearly 
describable  as  an  abstract  program  using  v-function  'aor'  and  o- 
functions  'low'  and  'next we  cannot  describe  the  first  case  in 
the  same  fashion. 

Paolini  and  Pelagatti  have  used  many-sorted  algebras  to  define 
database  mappings[  PaPe  ]..  First  we  will  present  their  formalism, 
and  then  we  will  discuss  it. 

A  many-sorted  algebra  is  a  family  of  sets  (A(s)  :  s  e  S)  and  a 

collection  E  of  operations.  S  is  called  the  sort  set ;  the  sets 
A  (s)  are  called  the  carriers,  and  E  is  called  the  signature  of 
the  algebra.  An  E-alqebr a  is  a  many-sorted  algebra  whose 
signature  is  E.  Given  two  E-algebras  A  and  B,  an  E-homomorphism 
h:A— >B  is  a  family  of  functions  (h (s) : A (s) — >B  (s)  :  s  €  S)  mapping 

the  carriers  of  A  to  the  corresponding  carriers  of  B  such  that 
the  operations  are  preserved  as  follows:  If  f  6  E  and 
(a1,...,an)  is  a  tuple  in  its  domain  (each  ai  is  in  some  carrier 
set  A  (s)  for  some  s)  ,  then  h  ( f  (a  1 , .  . .  ,an)  )  =  f  (h  (al ),....,  h  (an )  ) 
(where  the  S-subscript  was  omitted  from  h)  .  An  example  of  a 
manv-sorted  algebra  could  be  relational  algebra[ Codd72a ],  say,, 
where  S  =  {relation,  tuple,  descriptor}  and  E  =  {UNION, 
PROJECTION,-...}.  However,  the  algebra  used  by  the  authors  to 
discuss  mappings  simply  has  the  single  type  'database'  in  the 
sort  set, and  every  operation  in  E  has  the  type 

database— >database.  This  means  that  such  algebras  model  database 
states  and  state  transitions.  With  this  notion  of  database,  the 


(11,2)  6  1 


following  definitions,  leading  to  the  definition  of  mapping  are 
made : 

Given  two  databases  D1=(A1,E1)  and  D2=(A2,E2J  (Ai  is  the 
carrier  set  and  Ei  is  the  set  of  operations) ,  a  structural 
abstract  ion  is  a  partial  function  A  1— >A2.  An  ope rational 
translation  is  a  function  E2->E1*.  nsiven  an  operational 
translation  t,  an  imple mentation  is  an  E-algebra  (A1,t(E2))« 

That  is,  it  has  the  same  carrier  set  as  Dl,  and  its  operation  set 
is  the  image  of  E2  under  t.  A  structural  abstraction  m  correctly 
denotes  the  mapping  between  Dl  and  D2  if  m  is  an  E-homomorphism 
between  (Al,t(E2))  and  D2. 

Now  let  us  compare  this  work  with  the  constructs  given  in 
previous  sections  of  this  chapter.  The  formalism  of  Paolini  and 
Pelagatti  contains  some  sophisticated  terminology  relating  to 
many-sorted  algebras,  for  example,  homoraorphisms  are  used  for  the 
commutative  diagram  condition.  However,  their  machinery  was  not 
really  needed,  and,  in  fact,  the  final  definitions  were  more 
primitive  than  ours.  The  concepts  of  many-sorted  algebras  are 
not  needed  to  describe  simple  automata  which  are  what  D1=(A1,E1) 
and  D2=(A2,E2)  really  are.  Unfortunately,  the  definitions  of 
mapping  cannot  readily  be  extended  to  arbitrary  many-sorted 
algebras.  First  of  all,  we  cannot  assume  that  the  two  databases 
Dl  and  D2  have  the  same  signature,  since  external  data  models  may 
be  different  from  conceptual  ones.  Even  if  we  do  assume  that 
their  signatures  are  the  same,  there  are  still  problems  in 
extending  the  definitions  of  operational  translation.  In  the 
definition  given  above,  the  range  of  an  operational  translation  t 
is  El*,  the  set  of  seguences  of  operations  in  El.  Since 
everything  in  El  has  functionality  database— >datalase ,  this  set 
is  well-defined,  and  all  members  also  have  functionality 
database— >dat abase.  When  elements  of  El  have  arbitrary 
functionality,  we  can  generalize  the  •**  operator  to  take  all 
formal  compositions  of  composable  functions.  For  example,  if  El 
contains  the  functions: 

f 1 : ab->c 

f 2 :c— >a 

f3 : b— >a , 


(II, 2)  62 


then  the  set  El*  would  contain  the  compositions: 

f  1  <-,f  2(-)  )  :  ac— >c 

f 2  (f 1 (-,-) ) : ab— >b 

f3  (f 2 ( f 1 (—  r  f  2 ( — ) ) : ac— >  a,  etc. 

Note,  however,  that  El*  no  longer  has  the  same  signature  as  El 
(or  E2) .  Hence,  it  is  not  possible  to  define  a  mapping  as  a 
hcmomo  rphism. 

Another  problem  is  that  the  operational  translations  are  not 
dependent  on  the  carrier  set;  that  is,  they  do  not  depend  on  the 
state  of  the  conceptual  database.  As  we  have  remarked,  this 
restriction  should  not  be  made  in  a  general  definition  since  not 
all  operation  mappings  will  be  state  independent. 

2.6.  Summary  and  Conclusions 

We  have  defined  a  data  model  as  a  4-tuple  (Sch , Str , St , Q) , 
where  Sch  represents  schemas;  Str  represents  "states" 

(structures)  of  the  database  (relative  to  a  schema)  in  which  some 
constraints  may  possibly  be  violated;  St  represents  the  legal 
states,  and  Q  represents  the  data  model  operations.  We  also  gave 
a  nondeterministic  generalization  of  this  definition.  The 
presence  of  the  two  components  Str  and  St  implicitly  includes  the 
notion  of  constraint  since  anything  in  Str(s)— St(s)  violates 
constraints  (implicitly)  in  schema  s,  and  anything  in  St(s) 
satisfies  the  constraints  in  s. 

We  defined  a  mapping  model  Mm  from  data  -model  Dm  1  to  data 
model  Dm2  to  be  a  pair  (Ms, Mg).  If  si  €  Schl  and  s2  €  Sch2,  then 
Ks(s1,s2)  represents  the  set  of  all  mappings  from  structures  of 
schema  si  to  structures  of  schema  s2.  The  "guod"  mappings  are 
the  ones  which  take  states  to  states. 

If  si  €  Schl  and  s2  6  Sch2,  then  Mq(s1,s2)  represents  the  set 
of  all  operation  mappings,  i.e. ,  interpretations  of  s 2-op erat ions 
as  si-operations.  Operation  mappings  have  the  functionality 
Q2(s2)  x  Stl (si ) — >Q1 (si ) * ,  and  we  saw  that  tnis  is  equivalent  to 
an  abstract  program  which  makes  calls  to  si-operations.  The 
"good"  operation  mappings  (with  respect  to  a  state  mapping) 
satisfy  a  commutative  diagram  represented  also  by  the  equation 


(11,2)  63 


q  (m  (st) )  =  m  (t  (q,  st)  (st ) )  -  We  defined  three  conditions 
significant  to  database  applications  when  this  equality  might 
hold. 

The  condition  requiring  the  least  from  the  operation  mapping 
specifies  that  the  commutative  diagram  be  satisfied  when  both  the 
before-  and  after-structures  of  the  underlying  database  satisfy 
all  constraints.  This  condition  is  useful  when  the  underlying 
database  has  its  own  constraint  maintenance , mod ules. 

The  next  condition  specifies  the  commutative  diagram  be 
satisfied  whenever  the  before-structure  of  the  underlying 
database  is  a  state  and  the  after-structure  of  the  view  is  a 
state.  This  condition  is  useful  when  the  correctness  of  the 
application  program  using  the  view  has  been  verified. 

The  condition  requiring  the  most  of  the  operation  mapping  will 
have  the  commutative  diagram  satisfied  as  long  as  the  before¬ 
structure  of  the  underlying  database  is  a  state. 

We  also  defined  two  conditions  involving  the  satisfaction  of 
constraint.s- 

Two  other  formalisms  for  mappirgs  have  been  described  in  the 
literature,  and  we  compared  these  with  the  one  presented  in  this 
chapter.  We  argued  that  our  definitions  are  better  suited  for 
describing  the  significant  aspects  of  database  mappings.. 

Formalizations  are  useful  for  two  purposes:  to  provide  a 
framework  for  mathematically  proving  things,  and  to  provide  a 
precise  set  of  concepts  on  which  to  base  further  work.  In  this 
chapter  we  have  not  proved  any  theorems,  but  we  shall  show  in 
subsequent  chapters  that  the  concepts  introduced  can  be  a  take¬ 
off  point  for  getting  significant  results. 

One  of  the  important  concepts  introduced  has  been  that  of 
mapping  model.  At  the  same  level  of  abstraction  as  data  model, 
the  concept  of  mapping  model  provides  a  means  for  expressing  , 
global  questions  about  mappings.  An  example  follows: 

Suppose  Mm  1=  (Msl , Hql)  and  Mm2=  (Ms2, Mg2)  are  two 
mapping  models,  both  from  Dml  to  Dm2.  Mm2  miqht  represent 
a  very  powerful  but  unimplemented  language  such  as  set 
theory,  and  Mml  might  represent  a  simple  but  implemented 
language.  When  is  Mml  complete  with  respect  to  Mm2?  That 


(11,2)  64 


is,  given  schemas  si  6  Sch 1  and  s2  €  Sch2  and  a  consistent 
mapping  m2  €  Ms2(s1,s2),  does  there  exist  a  consistent 
mapping  ml  €  Ms1(s1,s2)?  Also,  given  a  state  mapping  m  6 
Ms1(s1,s2)  which  is  also  in  Ms2(s1,s2)  such  that  there 
exists  an  operation  mapping  t2  €  Mg2(s1,s2)  with  property 
p  (where  p  is  Mtype  1",  "type  2",  etc.),  does  there  exist 
an  operation  mapping  tl  6  Mg1(s1,s2)  which  also  has 
property  p?  / 

In  the  succeeding  chapters,  we  will  be  looking  at  more  local 
(and  more  tractable)  questions:  Ur.der  what  conditions  is  a 
structure  mapping  consistent?  When  does  an  operation  mapping 
have  the  type  3  property?  the  type  1  property? 


(11,3)  65 


CHAPTEF  3 


Predicate  Calculus  and  Mappings 


In  the  previous  chapter,  some  abstract  definitions  of  data 
models  and  mappings  were  given.  These  definitions  specified  the 
fundamental  structure  which  data  models  and  mappings  possess. 

What  we  would  like  to  do  in  this  chapter  is  to  use  Predicate 
Calculus  tc  give  more  insight  into  these  structures. 

Predicate  Calculus  is  a  useful  tool  for  the  following  reasons: 
It  is  well  defined  and  has  been  studied  by  logicians  for  a  long 
time.  There  is  a  very  close  analogy  between  Predicate  Calculus 
(PC)  axioms  and  interpretations  on  the  one  hand  and  database 
schemas  and  structures  on  the  other[WoMy]o  Also,  the  relational 
data  model  (with  normalized  relations)  is  very  similar  to  first 
order  Predicate  Calculus[  Codd72a ] . 

This  chapter  contains  five  sections.  In  Section  1  we  record 
without  much  discussion  the  basic  concepts  and  definitions  of 
First  Order  Predicate  Calculus.  Then  in  Section  2  we  compare 
these  PC  concepts  with  an  abstract  relational  data  model  and 
relational  structure  mappings.  We  see  that  schemas  correspond  to 
theories,  and  queries  correspond  to  formulas.  Relational 
structures  according  to  Definition  2.3  correspond  to  the 
structures  of  PC;  relational  states  are  the  "models"  of  PC. 

These  relationships  have  been  noted  before,  though  not  so 
explicitly.  An  original  contribution  of  this  chapter  is  the 
analogy  drawn  between  views  and  mappings  on  the  one  hand  and 
defined  theories  and  translations  by  defining  axioms  on  the 
other.  This  analogy  allows  a  PC  theorem  relating  truth  and 
validity  in  defined  theories  to  truth  and  validity  in  the 
original  theory  to  be  translated  into  a  theorem  about  the 
consistency  of  database  structure  mappings.  Section  3  gives  some 
examples  of  the  use  of  this  theorem. 


(11,3)  66 


In  Section  4  we  note  how  basic  Predicate  Calulus  can  be 
applied  to  model  more  than  just  the  static  properties  of  database 
mappings;  we  show  how  statements  about  allowable  transitions  can 
be  expressed  as  first  order  axioms. 

Section  5  summarizes  the  results  of  the  chapter. 

3.1.  First  Order  logic 

In  this  section  we  present  some  basic  definitions  and  theorems 
of  first  order  logic.  The  discussion  will  be  brief  since  we 
assume  prior  knowledge  of  the  subject.  The  details  may  be  found 
in  any  standard  text  on  Mathematical  Logic  (e.g.,  [BeSl],  [Klee], 
[Shoe  ]),. 

A  first  order  language  L  contains  the  following  sets  of 
symbols: 

(i)  a  countable  set  of  variables  [vO , v  1 ,  v2 ,  .  ,  v  j, . . . }  , 

(ii)  for  each  nonnegative  integer  n,  a  set  ?n  of  n-ary 
predicate  symbols,  (The  letter  P  will  denote  an  arbitrary  element 
of  Pn„) 

(lii)  the  equality  predicate  =,  which  is  an  element  of  P2, 

(iv)  for  each  positive  integer  n,  a  set  Fn  of  n-ary  function 
symbols,  the  letter  f  will  denote  an  arbitrary  element  of  Fn, 

(v)  a  set  C  of  constant  symbols,  (The  letter  c  will  denote  an 
arbitrary  element  of  C.) 

(vi)  the  symbols  where  -•  is  negation,  is 

disjunction,  -}  is  the  existential  quantifier, 

(vii;  brackets,  parentheses,  commas,  etc. 

Note  that  the  sets  Pi  are  assumed  to  be  disjoint,  and  it  is 
possible  that  any  except  P2  is  empty. 

The  language  L  also  contains  the  set  Tm  of  terms,  defined 
inductively  as  follows: 

(i)  variables  and  constant  symbols  are  terms, 

(ii)  if  f  €  Fn  and  t1,....,tn  €  T  m,  then  f(t1,...,tn)  €  Tm. 

Lastly,  L  contains  the  set  F  of  formulas,  also  defined 
inductively: 

(i)  if  P  €  Pn  and  t1,...,tn  6  Tm ,  then  P(t1,..„,tn)  is  an 
formula  and  atomic  formulas  are  in  F, 

(ii)  if  p  and  g  are  formulas,  then  so  are  -»p  and  (p+q) 


9 


(.11,3)  67 


(iii)  if  p  is  a  formula,  then  so  is  Hvj)  p  for  any  j=0,1,2,.. . 

It  is  useful  and  customary  to  introduce  defined  symbols  in  the 
following  way: 

(i)  write  p»g  for  -*(~,P+_,<3)  »  •  is  conjunction, 

(ii)  write  p=>g  for  -*p+q, 

(i.ii)  write  pfg  for  (p=>g)  •  (q=  >p ) , 

(iv)  write  (¥vj)p  for  -»(*3vj)-«p,  (¥  is  the  universal 

quant  if ie  r. ) 

An  occurrence  of  vj  in  p  is  bound  if  this  occurrence  is  in  a 
part  of  p  of  the  form  Hvj)g,  otherwise  the  occurrence  is  free. 

We  say  vj  is  bound  in  p  if  vj  has  a  bound  occurrence  in  p. 
Similarly,  vj  is  free  in  p  if  it  has  a  free  occurrence  in  p. 

Note  that  a  variable  may  be  both  bound  and  free  in  a  formula. 

For  terms  t  and  s,  the  symbol  t[s/j]  denotes  the  term  obtained 
from  t  by  replacing  each  occurrence  of  vj  in  t  by  s.  If  x  is  a 
sequence  (t0,t1,„..)  of  terms,  then  t[x]  denotes  the  term 
obtained  from  t  by  simultaneously  replacing  each  occurrence  of  vj 
by  tj  for  j=0,1,2,...  The  term  t[  x ]  is  called  an  instance  of  t„ 

If  p  is  a  formula  and  t  is  a  term,  we  say  that  t  is  free  for 
vj  in  p  if  for  each  variable  vi  occurring  in  t,  no  part  of  p  of 
the  form  Hvi)  q  contains  an  occurrence  of  vj  which  is  free  in  p. 

If  t  is  free  for  vj  in  p,  then  p[b/j  ]  denotes  the  formula 
obtained  from  p  by  replacing  each  free  occurrence  of  v j  b y  t.  If 
x  is  a  seguence  (t0,t1,„.)  of  terms  such  that  each  tj  is  free  for 
vj  in  p,  then  p[ x  ]  denotes  the  formula  obtained  from  p  by 
simultaneously  replacing  each  free  occurrence  of  v j  b y  tj  for 
j=0,1,2,...  We  call  p[  x  ]  an  instance  of  p.  We  will  write  p[  t  ] 
for  p[t/j]  when  no  confusion  can  arise. 

We  say  that  a  formula  p'  is  a  variant  of  p  if  p'  can  be 
obtained  from  p  by  a  seguence  of  replacements  of  the  following 
type:  replace  a  part  (-}v  j)  g  by  (-}vk)  q[  vk/j  ],  where  vk  is  not 

free  in  g. 

The  logical  axioms  of  L  are  a  distinguished  set  of  formulas  of 
L.  Let  p,  q,  r  be  any  formulas.  Then  the  following  formulas  are 
logical  axioms  (we  omit  parentheses  when  no  confusion  results) : 


(11,3)  68 


(i)  P  =>  (2  =>  p) 

(ii)  (p  =>  (q  =>  r))  =>  (  (p  =>  q)  =>  (p  =>  r)  )  , 

(iii)  (-*p  =>  -*q)  =>  (q  =>  P)  , 

(iv)  ( P* q)  =>  P, 

(v)  (p *q )  =>  q# 

(vi)  (r  =>  P)  =>  ( ( r  =>  q)  =>  (r  =>  (p*g))), 

(v ii)  p  (p+q)  , 

(viii)  q  =>  ( p-^q)  , 

(ix)  (p  =>  r)  =>  ((q  =>  r)  =>  ((p+q)  =>  r)), 

(x)  p[t/j]  =>  Hvj)p,  where  t  is  free  for  vj  in  p, 

(xi)  (Vv  j)  (p  =>  q)  =>  (p  =>  (Vvj)  q)  ,  where  p  contains 
no  free  occurrence  of  v j , 

(x ii)  (Vvj)  (vj  =  vj)  , 

(xiii)  (Vvj)  (¥vk)  (((vj  =  vk)  •  r)  =>  r*)/  where  vk  is  free 
for  vj  in  r  and  r'  results  from  r  by  replacinq  some, 
but  not  necesarily  all,  free  occurrences 
of  vj  by  vk, 

(xiv)  (Vvj)  (Vvk)  ( (v j  =  vk)  =>  (t  =  t[vk/j]),  for  any  term  t. 

Axioms  (i)  throuqh  (i/)  are  called  propositional  axioms. 

Axioms  (x)  and  (xi)  are,  respectively,  substitution  and  V- 

axioms.  We  call  (xii)  to  (xiv)  egua  li ty  axioms. 

Next  we  define  the  rules  of  inference-  There  are  two  of  them: 

(i)  for  any  formulas  p  and  q,  q  is  an  immediate  consequence 
of  p  and  p=>q , 

(ii)  for  any  formula  p  and  any  variable  vj,  (Vvj)  p  is  an 
immediate  consequence  of  p. 

It  should  be  noted  that  this  choice  of  axioms  and  inference 
rules  is  pot  at  all  unique.  There  are  many  equivalent 
f crmu lations. 

We  can  now  define  what  we  mean  by  a  first  order  theory.  A 
first  order  theory  T,  called  simply  a  theory,  consists  of  three 
thing  s : 

(i)  a  first  order  language  L  (T)  ,  called  1  when  T  is 
understood, 

(ii)  a  set  of  axioms,  the  logical  axioms  of  L  and  other, 

axioms, 

(iii)  the  inference  rules  for  L- 


(11,3)  69 


A  theory  is  completely  specified  by  giving  its  language  and 
ncnlogical  axioms.  In  the  future,  when  ve  refer  to  the  axioms  of 
a  theory  we  will  mean  its  nonlogical  axioms.. 

The  theorems  of  a  theory  T  are  defined  inductively  as  follows: 

(i)  the  logical  and  nonlogical  axioms  of  T  are  theorems, 

(ii)  the  immediate  consequences  of  theorems  are  theorems. 

If  p  is  a  theorem  of  T,  we  will  write  TJ*p. 

Theorem  3„1.  The  following  are  theorems  of  PC  (where  *.  =  >.'  is 
the  outermost  •=>'): 

(i)  (p=>g)  •  (g«r=>s)  .=>.  p»r=>q«s 

(ii)  p  =>  ( (p=>g)  =>q) 

(iii)  (p=>g)  -=>-  (p*r  =>  g«r) 

(iv)  (p=>g)  •  (r=>s)  .  => .  (p*r  =>  g»s) 

(v)  (p=>g)  -=>-  (P=>P*g) 

(vi)  p  •  (-}vj)q  =>  (-§vj)  (p«q)  ,  where  vj  is  not  free  in  p 

(vii)  (p=>g)  -  =  Hvj)p  =>  Hvj)q 

(viii)  (p=>q)  o  =  >.  (p«r  =>  q) 

Proof . 

See  [BeSl]  and  [Klee]. 

0 


A  language  L*  is  an  extension  of  a  language  L  if 

(i)  Pn  C  P*n,  for  each  n, 

(ii)  Fn  C  F*n  for  each  n,  and 

(iii)  C  c  C*.. 

A  theory  T*  is  an  extension  of  T  if  L(T*)  extends  L(T)  and 
every  theorem  of  T  is  a  theorem  of  T* . 

We  now  consider  the  question  of  defining  new  predicate  symbols 

in  a  theory.  Suppose  a  theory  T  is  given,  and  let  d  be  a  formula 
of  T  whose  free  variables  are  v1,...,vn.  An  extension  T'  of  T  is 
formed  by  adding  a  new  n-ary  predicate  symbol  P  and  a  new 
nonlogical  axiom  P  (v  1  ,  vn)  =d ,  called  tha  defining  axiom  of  ?„ 

Now  given  a  formula  q  of  T*,  we  obtain  a  formula  q*  of  T, 

called  the  translation  of  g  into  T  as  follows:  We  select  a 

variant  d*  of  d  in  which  no  variable  of  q  is  bound  and  replace 


(JI,3)  70 


each  occurrence  P  (t 1 , - . . , tn)  in  q  by  d*[t1/1# . tn/n].  Then  the 

following  is  true: 

Theorem  3.2.  Let  T  be  a  theory,  and  let  T'  be  the  extension  of  T 

by  the  axiom  P  (vl  .  ,  vn)  =d.  Then  T'hg  if  and  only  if  Tf-g*, 

where  q*  is  the  translation  of  q  into  T  by  the  defining  axiom. 

Proof.  See  [Shoe].  We  will  give  a  semantic  version  of  this 
theorem  in  the  next  section. 

0 

Given  a  language  L,  a  structure  S  for  L  is  a  nonempty  set  U, 
the  underlying  set  of  S,  together  with  the  following: 

(i)  for  each  P  €  Pn,  an  n-ary  relation  P  on  U  with  the 
restriction  that  the  binary  relation  associated  with  the  symbol 
'  =  '  is  the  equality  relation, 

(ii)  for  each  f  €  Fnk  a  function  f:U**n— >U , 

(iii)  tor  each  c  6  C,  an  element  c  <  U. 

Structures  give  interpretations  (truth  values)  to  the  formulas 
of  L.  Although  we  are  ultimately  only  interested  in  the  truth 
value  of  sentences,  it  is  advantageous,  for  example,  in  proofs  by 
induction,  to  also  define  truth  values  of  formulas.  Since  a 
formula  p  may  contain  free  variables,  we  need  to  assign  an 
element  of  0  to  each  free  variable  in  p.  We  do  this  by 
specifying  a  valuation  x  in  S  which  is  just  a  sequence 
(a0,a1,„..)  of  elements  of  U. 

Let  us  first  define  what  we  mean  by  the  value  [t,S,x]  of  the 
term  t  in  structure  S  with  respect  to  valuation  x.  This  is  done 
by  induction: 

(i)  [  v j, S,x  ]  =  x j, 

(ii)  [  c,S,  x  ]  =  c 

(iii)  [  f (tl  ,.  .  .  ,tn)  ,S  ,x  ]  —  f([t1,S,x]f...  ,[tn,S,x])  . 

Now  we  define  when  a  formula  p  is  true  in  S  with  respect  to  x, 
and  we  write  S,x  [  p  or  S  [  p[x]  in  this  case: 

(i)  if  P  6  Pn  and  t1,„..,tn  are  terms,  then 

S,x  £=  P  (t  1r ,tn)  if  and  only  if  ([  1 1 ,  S ,  x  ],...,[  tn  ,  S  ,  x  ])  6  P, 

(ii)  5,  x  §=  ->p  if  and  only  if  not  S,x  £  p, 

(iii)  S,x  £=  p*g  if  and  only  if  S,x  £  p  or  S,x  §=  q. 


(II. 3)  71 


(iv)  if  y  6  U,  we  let  x(y/j)  be  the  valuation  which  is  the 
same  as  x  except  for  having  y  in  the  j-th  place,  then 
S,x  £  Hvj)  p  if  and  only  if  for  some  y  6  U,  S,x(y/j)  £  p. 

From  (ii) ,  (iv)  and  the  definition  of  the  universal 
guantifier,  we  see  that  S,x  §=  (Vtf  j)  p  if  and  only  if  for  all  y  € 

U,  S,x(y/j)  £  p.  It  is  also  easy  to  see  that  if  x  and  x*  are 
valuations  such  that  if  vj  occurs  free  in  p,  then  xj=x*j,  than 
S,x  §=  p  if  and  only  if  S,x'  §  p.  In  other  words,  whether  or  not 
S,x  §  p  depends  only  on  the  elements  of  x  which  correspond  to 
free  variables  of  p.  Thus  we  are  allowed,  when  convenient,  to 
write  S  p[xk,„.„,xn]  in  place  of  S  £  p[  x  ]  when  the  free 
variables  of  p  are  among  vk,...,vn.  If  p  is  a  sentence,  we  may 
write  S  §=  p  without  referring  to  any  valuation  at  all. 

We  say  a  formula  p  is  valid  in  S  if  S,x  £=  p  for  all  valuations 
x.  We  may  write  S  p  when  p  is  valid  in  S.. 

Now  let  T  be  a  theory  over  L..  A  relational  structure  S  is  a 
model  of  T  if  for  every  axiom  p  of  T,  S  §=  p.  (This  is  not  to  be 
confused  with  the  concept  of  ‘data  model'.)  A  formula  p  is  valid 
in  T  if  for  every  model  S  of  T,  S  §=  p.  The  relations  between 
being  consistent  and  having  a  model  on  the  one  hand  and  between 
being  valid  and  being  provable  on  the  other  are  given  by  the  two 
forms  of  the  Completeness  Theorem: 

Theorem  3. J. 

(i)  A  theory  T  is  consistent  if  and  only  if  T  has  a  model. 

(ii)  A  formula  p  is  valid  in  a  theory  T  if  and  only  if  p  is 
provable  in  T.. 

Proof.  See  [BeSl]„ 

□ 


The  Completeness  Theorem  tells  us  that  the  notion  of 
provability,  which  is  a  syntactic,  symbolic  concept,  correctly 
captures  the  semantic  concept  of  truth.  The  question  of  whether 
we  can  devise  an  algorithm  for  proving  (or  refuting)  PC  sentences 
has  a  negative  answer: 


(11,3)  72 


Theorem  3.4.  The  decision  problem  for  Predicate  Calculus  is 
unsolvable.  That  is,  there  is  no  decision  procedure  for 
determining  whether  a  PC  formula  is  provable  in  the  calculus. 

Proof.  See  [ Klee  ]» 

/ 

0 


3.2.  Predicate  Calculus  and  the  relational  Model 

Now  let  us  see  how  these  formal  logic  concepts  relate  to 
database  concepts,  specifically  to  the  relational  approach. 

It  is  fairly  clear  that  relation  names  correspond  to  predicate 
symbols..  When  we  view  a  predicate  symbol  P  as  a  relation  name, 
we  can  interpret  the  atomic  formula  P  (tl tn)  as  a  statement 
that  tuple  (t1,„..,tn)  is  in  relation  ?. 

Constraints  on  relations  correspond  to  the  nonlogical  axioms 
of  a  theory-  For  example,  a  functional  dependency  constraint 
from  domain  1  to  domain  2  in  a  binary  relation  E  can  be  written 
as  a  PC  axiom  (¥v1)  (Vv2)  (-Vv3)  (E  (v  1 ,  v2)  »E  ( v  1 , v 3)  =>  v2  =  v3)  . 

The  schema  or  database  description  corresponds  to  a  theory. 
That  is,  relational  schemas  (at  least  at  an  abstract  level) 
consist  of  two  parts:  a  name  part  and  a  constraint  part.  The 
name  part  specifies  the  corresponding  first  order  language,  that 
is,  the  sets  Pn,  n=0 , 1 , 2  ,.. . .  ,  of  predicate  symbols.  The 
constraint  part  specifies  the  corresponding  nonlogical  axioms. 

A  schema  specifies  a  set  of  allowable  database  states.  For 
example,  if  there  is  a  ternary  relation  name  E  in  the  schema,  the 
database  state  will  contain  a  set  of  3-tuples  which  is  the 
(current)  interpretation  of  F.  Since  a  legal  state  of  a  database 
must  not  violate  the  constraints  in  the  schema,  database  states 
correspond  to  models  of  the  corresponding  theory.  We  must  make 
the  restriction,  however,  that  the  models  are  finite,  i.e.,  if  S 
is  model  of  a  theory  (schema),  then  for  each  predicate  P ,  P  in  S 
is  finite. 

Queries  simply  correspond  to  f or mula s[ Mink  ].  (Compare  also 
with  Codd's  Eelational  Calculus[  Codd72a ]. )  If  a  formula  has  n>1 
free  variables,  then  the  result  of  the  query  is  a  set  of  n- 
tuples.  If  the  formula  is  a  sentence,  then  the  result  is  " true" 
or  "false".  We  must  ensure,  however,  that  the  result  of  any 


(11,3)  73 


guery  is  finite..  For  example,  the  guery  -»R(v1,v2),  which  gives 
all  tuples  not  in  E,  will  result  in  an  infinite  set.  In  Codd ' s 
Relational  Calculus,  the  finiteness  of  guery  results  is 
guaranteed  by  restricting  the  formulas  to  be  range  separable.  We 

will  not  introduce  an  analogous  restriction  here  for  two  reasons. 

/ 

First,  all  our  examples  of  formulas  considered  as  queries  will 
give  finite  results,  and,  second,  the  problem  of  giving  necessary 
and  sufficient  conditions  for  a  formula  always  to  give  finite 
results  on  finite  models  may  be  unsolvable.  We  have  no  proof  of 
this,  but  consider  a  formula  g  of  the  form 

(p=>R  (vl,.,  .„,vn))  •  (-*p=>-»E  (v  l,..«  ,  vn))  ,  where  p  is  a  sentence  and 
E  is  a  predicate  symbol.  If  Tpp,  i.e.,  ,  if  p  is  provable,  then 
for  any  structure  S,  q  will  always  give  the  result  R  in  S  ,  while 
if  p  is  not  provable,  g  will  on  some  models  give  the  result  R~ 
(the  complement  U**n— E)  which  is  infinite.  (To  state  this  as  a 
useful  theorem,  we  would  need  to  prove  that  not  T  ^p  implies  there 
is  a  finite  model  in  which  -»p  is  true..)  Since  provability  is 
undecidable,  the  finite  result  problem  is  probably  also 
undecidable..  But  as  we  have  stated,  we  will  not  worry  about  this 
problem. 

The  definition  of  the  result  of  a  formula  applied  as  a  guery 
tc  a  model  is  guite  easy  to  give  using  the  definition  of  truth 
given  in  Section  3.1: 

Suppose  p  has  free  variables  V= {. . . , vk . . r v j, . . . } .  Then  the 
result  [p,S]  of  p  on  interpretation  S  is  {x  :  S,x  f=  pj  projected 
cn  V.  (Eecall  that  "S,xf:p"  does  not  depend  on  the  components  of 
the  valuation  x  corresponding  to  variables  not  free  in  p„ )  It 
will  also  ease  the  notation  to  assume  that  the  free  variables  of 
F  are  exactly  v1,....,vn..  Then  we  may  write  £p,S]  =  {  (x1,...,xn) 

:  S  f  p[  x  ,.«  .  ,  xn  ]} . 

Now  let  us  consider  two  relational  schemas  T1  and  T2«  That 
is,  in  our  PC  terminology,  T1  and  T2  are  first  order  theories- 
We  would  like  to  define  T2  as  a  view  of  T1.  For  simplicity,  we 
will  assume  that  the  constant  and  function  symbols  in  both 
languages  are  the  same.  In  database  terms,  this  means  that  both 
data  models  have,  for  example,  arithmetic  operations  and  the  same 
character  sets.  Recalling  that  views  in  existing  relational 
database  systems  which  support  views  are  defined  by 


(11,3)  74 


queries!  Ston ],  and  noting  that  queries  correspond  to  formulas,  we 
see  that  to  define  T 2  as  a  view  of  T1 ,  we  need  to  specify  a  set 
cf  defining  axioms,  one  for  each  predicate  symbol  (relation  name) 
of  T2.  Let  A  be  such  a  set  of  defining  axioms.  This  means  that 
for  every  P  €  P2n,  there  is  a  formula  in  A  of  the  form 
P  (vl  vn)  5  d,  where  the  free  variables  of  d  are  v1,...,vn. 

Given  any  structure  SI  for  T1,  we  get  a  structure  S2  for  T2  by 
specifying  that  for  each  P  in  L(T2),  P  =  [d,S1].  In  other  words, 
the  valuations  (xl,..  .,  xn)  such  that  S2  £  P[  x  1  xn  ]  are 

exactly  the  ones  such  SI  f=  d[  xl , .  ...  ,xn  ]-  We  will  also  assume 
that  function  and  constant  symbols  are  always  interpreted 
identically.  In  this  case,  we  will  write  S2=A(S1)„  As  we  stated 
in  Chapter  2,  the  mappings  we  are  interested  in  are  the  ones 
which  map  models  of  !1  to  models  of  T2.  The  PC  machinery  we  have 
presented  car.  give  a  precise  condition  for  A  to  be  a  "consistent" 
mapping  as  in  Definition  2.4. 

The  next  theorem  is  the  semantic  version  of  Theorem  3.2.  We 
use  it  to  give  the  condition  for  mappings  to  be  consistent. 

Theorem  3.5.  Let  T1  be  a  relational  schema  (PC  theory) ,  and  let 
T2  be  a  view  of  T1  by  the  set  A  of  defining  axioms.  Let  SI  be  a 
structure  of  T1 ,  and  let  p  be  a  formula  of  T2.  Then 

(i)  for  any  valuation  x,  A(S1),x  £  p  if  and  only  if  S1,x  £  p*, 

and 

(ii)  [  p,  A  (S 1 )  ]  =  Ip*, SI],  where  p*  is  the  translation  of  p 
into  T 1  by  A. 

Proof.  Part  (ii)  is  an  immediate  consequence  of  part  (i) .  The 
two  clauses  are  actually  equivalent,  but  it  is  more  convenient 
for  an  induction  proof  to  do  part  (i)  first.  This  first  part  we 
prove  by  induction  on  the  length  of  p.  If  P  6  P2n,  say  with 


P  (v  1 , .  ,  vn) 

=  d  in 

A  f 

then 

for  any 

x. 

A(S1)  , 

x  £ 

P  (v  1 ,  - . . 

t  vn)  if 

anc 

1  only  if 

(xl,.... 

xn) 

€  P  . 

Since 

p  = 

[d,Sl 

]  = 

{  (y 1  r  -  -  - 

ryn)  : 

S  1 

£  d[  y  1 , . . 

•  r  yn  ]}  , 

thi 

s  is 

true  if 

a  nd 

only 

if 

SI 

1=  d[  xl 

-  ,  xn  ]. 

For  disjunction. 

we 

have 

A  (SI)  ,x 

£  P 

♦  q  if 

and 

only  if 

A  (S 1 )  ,  x  £  p  or  A  ( S 1)  ,  x  £  q,  which  is  true  if  and  only  if 

S1,x  £  p*  or  S 1 ,  x  £  g*,  i.,e.,  if  and  only  if  SI  ,x  £  (p+g)  *  (p*+q* 

is  the  same  as  (p+g)  *)  . 


(11,3)  75 


Next,  A  (Si )  ,  x  £  ”»P  if  and  only  if  not  A(S1),x  £  p,  which  is 
true  if  and  only  if  not  S1,x  £  p*,  i.e.,  if  and  only  if 
S 1 ,  x  £  (-p)  *  (since  -»(P*)  is  (-»p)  *)  . 

Finally,  A  (SI)  ,x  £  Hvj)p  if  and  only  if  for  some  y  €  U, 

A(S1),x(y/j)  £  p,  which  is  true  if  and  only  if  S1,x(y/j)  £  p*  , 

/ 

i-e.  ,  if  and  only  if  S1,x  £  ((-Jvj)p)*  (here,  (Hvj)p)*  is  the 
same  as  Hvj)  (p*)  )  . 

D 

Theorem  3»6„  Let  T1  be  a  relational  schema  (PC  theory),  and  let 
T2  be  a  view  of  T1  by  the  set  A  of  defing  axioms.  Then  the 
structure  mapping  determined  by  A  is  consistent  if  and  only  if 
for  every  axiom  p  cf  T2,  p*  is  valid  in  T1,  where  p*  is  the 
translation  of  p  into  T1  by  A. 

Proof.  Suppose  A  defines  a  consistent  mapping.  Let  SI  be  any 
model  for  T1,  and  let  p  be  an  axiom  of  T2-  Then  A  (S'!)  is  a  model 
for  T2,  and  so  for  any  valuation  x,  A  (SI)  ,x  £  p..  By  the  previous 
theorem,  Sl,x  £  p*.  Thus  p*  is  valid  in  T1« 

Now  suppose  p*  is  valid  in  T1,  for  each  axiom  p  of  T2.  Then 
for  every  model  SI  of  T1  and  every  valuation  x,  S1,x  £  p*„  By 
the  previous  theorem,  A(S1),x  £  p,  i.e.,  A  (SI)  is  a  model  for  T2. 
Thus  A  is  a  consistent  mapping. 

0 


From  Theorem  3.5,  we  see  that  the  formula  translation  we  have 
defined  works  as  the  translation  algorithm  for  formulas  treated 
as  queries.  Namely,  if  g  is  a  query  of  T2,  and  T2  is  defined  as 
a  view  of  T1  by  a  set  A  of  defining  axioms,  then  for  any 
structure  SI  of  T1,  a  query  q  gives  the  same  result  on  A  (SI)  as 
the  query  q*  does  on  SI.  This  type  of  formula  translation  is  a 
general  form  of  guery  mod  if  ication[  Ston  ]. 

Theorem  3.6  specifies  exactly  what  is  needed  to  prove  the 
consistency  property  cf  a  mapping.  Since  validity  in  Predicate 
Calculus,  which  is  equivalent  to  provability,  is  generally 
undecidable,  t.he  consistency  property  is  also  unsolvable  in 
general..  We  Lecord  this  in  the  following  theorem: 


(11,3)  76 


Theorem  3.7.  The  problem  of  determining  consistent  structure 
mappings,  i.e.,  state  mappings,  is  unsolvable. 

0 

In  formal  logic  itself,  there  are  a  number  of  subsets  of  PC  in 

/ 

which  provability  is  decidable[  Klee  ].  What  we  would  like  to  do 
in  the  database  context  is  derive  a  language  which  can  express  an 
important  class  of  database  constraints  and  mappings,  but  which 
also  has  decidable  consistency  (and  other)  problems  and  for  which 
mappings  can  be  proved  correct.  This  problem  will  be  dealt  with 
in  succeeding  chapters. 


3. 3.  Exa mples  of  Consistency  Per ivations 

Now  let  us  give  some  examples  of  state  mappings,  and  let  us 
see  how  one  can  prove  their  consistency. 

In  the  first  example,  we  specify  functional  dependencies  as  PC 
formulas  and  show  how  an  ED  in  the  view  can  be  derived  from  FDs 
in  the  base  schema: 


T1:  S(2),  T  (2)  (predicate  symbols  or  relation  names) 

S(v1,v2)  *S(v1,v3)  =>  v2=v3  (axiom:  an  FD  1— >2) 

T (vl  ,v2) »T  (vl , v3)  =>  v2=v3  (axiom:  an  FD  1->2) 

T2:  F.  ( 3i  (predicate  symbol) 

R(v(,v2,v3) •P.(v1,v4,v5)  => 

v2  =  v4  •  v3=vf  (axiom:  an  FD  1->2,3) 

A:  R  ( v  1 ,  v  2,  v3)  =  S  (vl  ,  v2)  *T  ( v2,  v3) 

(defining  axiom:  R  is  a  join  of  S  and  T) 

Let  p  denote  the  single  axiom  of  T2„  Then  p*,  the  translation  of 
p  into  T 1  by  A,  is  the  following  formula: 

R(v1,v2,v3)  R  (v  1 ,  v  4  ,  v  5) 

, - 1  i - 1 

S  (vl,  v2)  »T  (v2,  v3)  «S  (vl  ,v4»  «T(v4,  v5)  =>  v2=v4  •  v3  =  v5 


We  can  prove  this  from  the  axioms  of  T1  as  follows: 


(1)  S  (vl  ,v2)  «S  (vl  ,v4 )  =>  v2=v^  (axiom  of  T1) 

(2)  T(v2,v3)  •  v2=v4  =>  T(v4,v3)  (axiom  (xiii)) 

(3)  S(v1,v2)  «S(v1,v4)  »T(v2,v3)  =>  v2=v4  •  T<v4,v3) 

((1)  &  (2)  6  Theorem  3-1. i) 


(11*3)  77 


(4)  T  (v4,  v3)  «T(v4,vb)  =>  v3=v5  (axiom  of  T1) 

(5)  S  (v1,v2)  *S  (v1,v4)  *T(v2,v3)  *T  (v4,v5)  =>  v2=v4  •  v3=v5 

((3)  &  (4)  &  Theorem  3„1.i) 

(Note-  This  proof  actually  left  some  details  out.  PC  proofs  in 
general  are  quite  tedious-) 

In  the  next  example,  there  are  no  constraints  (nonlogical 
axioms)  in  T1  involving  the  predicate  S,  but  the  mapping 
(defining  axiom  for  3)  adds  structure  by  selecting  the  minimum 
value  of  the  second  domain  of  S  for  each  value  of  the  first 
domain.  We  prove  that  this  is  enough  to  derive  a  functional 
dependency  in  T2 : 

T1:  S(2),  <  (2)  (axioms  for  <  implicit) 

T2:  F.  ( 2) 

E  (vl ,  v2 ) »B (vl ,  v3  >  =>  v2=v3  (an  FD  axiom) 

A:  E  ( v  i ,  v  2 )  =  S  (v  I,  v2)  •  (¥v3)  (S(v1  ,v3)  =>  v2<v3) 

The  translation  of  the  single  axiom  of  12  by  A  into  T1  is  the 
following,  where  *-=>.*  denotes  for  clarity  the  outer  level 

E  ( v  1  ,  v  2 

i - - - - — - ■  ■ 

S  (v1,v2)  •  (¥v4)  (S(v1  ,v4)  =>  v2<v4)* 

S  (vl,  v3)  •  (¥v4)  (S  (vl ,  v4)  =>  v3<v4)  v2=v3 

$ (vl ,v3) 

We  can  prove  this  formula  as  follows: 

(1)  (Vv4)  (S(v1,v4)  =>  v2<v4)  „=>.  (S  (v  1 , v3)  =>  v2<v3) 

(a^ciora  (x)  contrapositive) 

(2)  (¥v4)  (S  (vl,  v4)  =>  v2<v4)  .=>.  (S(v1,v2)  =>  v3<v2) 

(axiom  (x)  con  tra  positive) 

(3)  S  (vl  ,  v3)  •  (S  ( vl ,  v3)  =>  v2<v3)  .  =>.  v2<v3  (Theorem  3.1.,ii) 

(4)  S  (vl ,  v2)  •  (S  (vl  ,v2)  =>  v3<v2)  .=>.  v3<v 2  (Theorem  3.  1„ii) 

(5)  v2<v3  •  v3<v2  =>  v2=3  (axiom  for  <) 

(6)  S  (vl  ,v3)  •  (¥v4)  (S  (vl  ,  v4)  =>  v 2<v4 )  .  =  >.  ?2<v3 

((1)  &  (3)  &  Theorem  3.  1.  iii) 

(7)  S  (vl ,  v2)  •  (¥v4)  (S  (vl,  v4)  =>  v3<v4)  .=>.  v3<v2 

((2)  6  (4)  S  Theorem  3-1- iii) 

(8)  S(v1  ,v3)o  (¥v4)  (S  (vl  ,v4)  =>  v2<v4)« 


(11,3)  78 


S  ( vl , v2) • *Vv4)  (S(v1,v4)  =>  v3<v4)  .=>.  v2<v3  •  v3<v2 

((6)  &  (7)  &  Theorem  3.1.iv) 
(9)  S  (vl ,  v2)  •  (¥v4)  (S  (vl,  v4)  =>  v2<  v4)  «S  ( vl  ,v3)  • 

(¥v  4)  (S  (v  1 ,  v4)  =>  v3<v4)  .=>.  v2=v3  ((5)  &  (8)) 

3,  4.  Extensions  of  First  Order  Languages 

In  the  rest  of  this  chapter,  we  will  discuss  how  a  predicate 
calculus-based  formalism  might  be  extended  to  represent  other 
facets  of  constraints  and  mappings. 

Any  predicate  calculus  system  (of  any  order)  can  only  directly 
represent  what  are  called  "static"  constr aints[ ABCS ].  That  is, 
the  axioms  represent  properties  holding  in  single  states  of  the 
database.  Yet  there  are  other  kinds  of  constraints  which  are 
needed  in  database  systems.  These  are  called  transition 
constraints*.  They  represent  relationships  which  must  hold 
between  the  database  state  before  an  update  operation  and  the 
state  after  the  operation..  For  example,  a  transition  constraint 
on  an  employee  database  might  be  "employee  salaries  never 
decrease",  or  "medical  records  are  never  deleted"..  We  would  like 
to  know  if  these  kinds  of  constraints  can  be  represented  as 
statements  of  a  first  order  language;  for  if  they  could,  then  all 
the  available  proof-theoretic  machinery  could  be  brought  to  bear 
to  prove  consistency  of  mappings  between  schemas  containing  both 
static  and  transition  con  strain ts-  The  answer  is  yes,  such 
constraints  can  be  represented  as  PC  formulas..  We  proceed  as 
fellows: 

For  every  predicate  symbol  (relation  name)  P  in  a  theory 
(schema)  T,  we  introduce  a  new  predicate  symbol  P°.  For  each 
axiom  p  in  T,  we  add  an  axiom  p°  obtained  from  p  by  replacing 
each  occurrence  of  a  predicate  symbol  P  by  P0^  Then  to  the 
resulting  theory  we  can  add  any  axioms  we  like  containing 
predicates  with  or  without  the  10 '  mark.  Structures  for  the 
language  L°  of  the  augmented  theory  will  represent  transitions: 

If  P  is  an  original  predicate  (relation),  and  if  S  is  a  structure 
for  L° ,  the  augmented  language,  then  P  in  S  represents  the 
contents  of  P  before  the  transition,  and  P°  represents  the 
contents  of  P  after  the  transition.  Some  examples  follow  which 


(11,3)  79 


illustrate  the  expression  of  transition  constraints  in  this 
forma lism : 

"salaries  of  employees  do  not  decrease" 

E  ( v  1 ,  v  2/  •  E°(v1,v3)  =>  v2<v3, 

where  the  first  coordinate  denotes  the  employee’s  name,  and  the 
second  denotes  the  salary* 

"medical  records  are  never  deleted  (although  updates  are 
allowed) " 

M(v1,v2)  =>  Hv3)  M°  (vl  ,v3)  , 

where  the  first  coordinate  denotes  the  key  of  the  record,  and  the 
second  denotes  the  remaining  data. 

"the  only  persons  who  can  be  added  to  the  club  membership  list 
are  those  who  are  on  the  approved  list" 

-M  ( V 1 )  •  MO  (vl)  =>  A  (v  1)  , 

where  M  is  the  member  list  predicate,  and  A  is  the  approved 
predicate- 

Next  we  give  an  example  of  a  proof  of  the  consistency  of  a 
mapping  when  the  schemas  (theories)  contain  transition 
constraints.  In  order  to  get  a  set  of  defining  axioms  for  the 
augmented  theories,  we  introduce,  for  each  defining  axiom 
P  (vl , .  ,  vn)  =  d,  the  axiom  P  0  (v  1 , .  . .  ,  vn)  =  a°„  The  transition 

constraint  in  T2,  below,  says  that  tuples  of  E  are  not  deleted, 
assuming  domain  1  is  an  identifier.  Jhe  mapping  defines  E  as  a 
jcin  of  T  1-predicates  S  and  ’x„  Note  that  there  is  no  transition 
constraint  on  T.  The  transition  constraint  on  E  follows  from  the 
transition  constraint  of  S  and  the  static  constraint  involving 
both  S  and  T. 

T 1 :  S  (2)  ,  T  (2) 

S(v1,v2)  =>  (-}v3)  S 0  (vl  ,v3)  (no  deletions  from  S) 

S(v1,v2)  =>  (-}v3 )  T  (v2 ,  v3)  (a  "foreign  key"  axiom) 

S°(v1,v2)  =>  (-}v3)  T°  (v2,  v3)  (a  "foreign  key"  axiom) 

T2 :  E  (3) 

E(v1,v2,v3)  =>  Hv4)  (-}v5)  E  0  (vl  ,  v4 ,  v5) 


(11,3)  80 


(no  deletions  from  R) 

A:  R(v1,v2,v3)  =  S  (vl  ,v2)  •  T(v2,v3) 

R°  (v  1 ,  v2  ,  v3)  =  S  0  ( vl , v2)  •  T°(v2,v3) 

The  translation  of  the  axiom  of  T2  by  A  into  T1  is 

R  (v  1 ,  v2,  v3)  R°(v1,v4,v5) 

r - 1  i - 1 

S  ( v  1  ,  v  2)  •  T(v2,v3)  =>  Hv4)  Hv5)  (SO  (v  1,  v4)  •  T<>(v4,v5))# 

and  we  prove  this  axiom  as  follows: 

(1)  S°  (v1,v4)  =>  (^v5)  TO  (v4. v5)  (axiom  of  T1) 

(2)  S  0  (v  1  ,  v4  )  =>  S° (vl , v4)  •  (-}v5)  T°  (v4,  v5) 

((1)  S  Theorem  3.  1.  v) 

(3)  S 0  ( v  1 ,  v4)  =>  Hv5)  (S°(v1,v4)  •  T<>(v4,v5)) 

((1)  6  Theorem  3. I.vi) 

(4)  (-}v4  )S°(v1,v4)  =>  Hv4)  Hv5)  (SO  (V1,v4)  *T<>(v4,v5)) 

((3)  &  Theorem  3.  I.vii) 

(5)  S(v1,v2)  =>  (-^v 4 )  S°  ( vl ,  v4)  (axiom  of  T1) 

(6)  S(v1,v2)  =>  Hv4)  (-3v5)  (S°  (v1,v4)  •  TO(V4,v5))  ((4)  &  (5)) 

(7)  S(v1,v2)  •  T(v2,v3)  =>  (-}v4)  (-}v5)  (S©  (v  1 ,  v4)  •  TO(v4,v5)) 

((6)  €  Theorem  3-1.viii) 

Another  area  in  which  to  expand  the  use  of  Predicate  Calculus 
as  a  schema  and  mapping  language  is  towards  higher  order 
languages.  In  first  order  languages  it  is  possible  to  express 
many  constraints  of  interest,  but  there  are  still  other 
constraints  useful  in  database  management  which  cannot  be 
captured  by  firs*  order  sentences.  For  example,  it  is  possible 
to  say  that  there  are  at  least  n  values  with  property  p,  or  that 
there  are  exactly  n  values  with  property  p,  etc.,  by  using  first 
order  definitions  of  the  form: 

(*9>n  vj)p  y  (-§  v  1 )  .  .  .  (-^vn)  (vl  *v2».  ..  «v j*vk  •. . .  • 

Pi  vl/j  «p[  vn/j  ])  , 

where  v1,...,vn  are  free  for  vj  in  p,  and 

H=n  vj)  p  =  Hvl) - (-}vn)  (vl* v2« _ «v j*vk»...« 

P[  v1/j]*-»-  *p[  vn/j  ]• 

(*y)(p[y/j]  =>  y=vi  ♦  y=v2  +...)). 


:n,3)  8i 


However,  it  is  not  possible  to  say,  for  example,  that  the  number 
of  tuples  in  relation  R  is  equal  to  the  number  of  tuples  in 
relation  S.  Intuitively,  we  would  like  to  write 

(¥n)  (-}=n  vl)  (^v2)  R  (vl  ,v2)  =  (-}=n  vl)  Hv2)  S  ( vl ,  v2)  , 

where  domain  1  is  the  key  of  R  and  S„  However,  the  *  n*  in 
•  (-}=n  vl)  •  is  not  a  formal  variable;  it  is  a  metavariable  or  a 
syntactic  variable  —  a  device  to  allow  us  to  write  down  a 
countable  number  of  formulas  as  just  one  formula  scheme.  In 
order  to  express  this  kind  of  property,  we  need  to  use  second 
order  sentences. 

Fox  example,  in  a  second  order  language,  a  binary  predicate  N 
could  be  defined  which  takes  a  first  order  predicate  as  the  first 
argument  and  an  individual  variable  as  the  second..  Then  N(P,x) 
could  be  interpreted  as  saying  that  the  number  of  tuples  in  P  is 
x.  (We  could  egually  well  define  a  function  which  takes  a 
predicate  symbol  as  its  single  argument..)  Other  aggregate 
functions  such  as  Min,  Max  and  Ave  can  also  be  defined  formally 
as  second  order  predicate  or  function  symbols.  Since  these  are 
useful  functions  in  query  languages,  they  would  also  be  useful  in 
defining  mappings.. 

Another  possible  application  of  second  order  languages  is  in 
formalizing  operations  and  operation  mappings.  For  example,  the 
insert  operator  can  be  defined  as  a  predicate  which  takes  a 
predicate  argument  and  ar  individual  variable  argument.  One  of 
the  axioms  for  this  insert  predicate  would  be,  for  e-xample; 

(¥x)  (¥P)  insert  (P,x)  (x)  , 

meaning  that  x  is  always  a  member  of  the  insert (P,x)  predicate 
(relation) .  No  more  will  be  said  here  about  this  except  that 
there  is  a  similarity  with  this  approach  and  that  of  algebraic 
specif  ications[  Gutt  ]. 

3.5.  Summary  and  Conclusions 

This  chapter  has  had  two  objectives.  First,  we  gave  general 
syntactic  (proof- theoretic)  requirements  for  mappings  to  have  the 
consistency  property  defined  in  Chapter  2.  We  also  showed  in 
Theorem  3.7  that  the  general  problem  of  proving  mappings 


(11,3)  82 


consistent  is  unsolvable.  This  provides  the  motivation  for  Parts 
III  and  IV  of  the  thesis,  in  which  we  restrict  the  schema  and 
mapping  languages  in  order  to  solve  the  consistency  problem  for 
structure  mappings  and  also  the  correctness  problem  for  operation 

mappings. 

The  second  objective  of  this  chapter  was  to  indicate  how 
Predicate  Calculus  can  be  used  as  a  data  model  and  as  a  mapping 
model.  There  has  been  much  debate  concerning  the  most  desirable 
formalism  to  use  as  a  conceptual  data  model  (e-g. ,  [Kent], 
[Nijs77],  [Schd])»  In  view  of  the  exhaustive  research  which  has 
gone  into  the  properties  of  Predicate  Calculus  and  related 
languages,  the  use  of  formal  logic  seems  attracti ve[ Stee ].  This 
chapter  has  indicated  the  manner  in  which  formal  logic  can  be 
used  in  a  database  mapping  environment. 


Belat iona 1  Structure  Ma p pings 


C HAPTEF  4 

Formulation  an d  Soundness  of  Pules 
for  relational  State  Mappings 


In  the  previous  chapter,  we  defined  a  very  general  kind  of 
static  data  model  and  a  very  general  kind  of  structure  mapping 
both  based  on  Predicate  Calculus.  (By  "static"  we  mean  a  data 
model  without  the  operations  considered.)  We  were  able  to  give 
necessary  and  sufficient  conditions  for  such  structure  mappings 
to  be  consistent,  but  we  saw  that  no  algorithm  existed  to 
implement  the  conditions.  In  this  chapter  we  will  consider  a 
less  general  static  data  model  and  a  less  general  mapping  model. 
Ey  making  these  restrictions,  we  will  be  able  to  give  decidable 
necessary  and  sufficient  conditions  for  a  structure  mapping  to  be 
consistent. 

We  can  view  the  contents  of  this  chapter  as  a  part  of  a  larger 
program  of  work.  We  know  from  Chapter  3  that  we  cannot  construct 
algorithms  for  recognizing  state  mappings  for  general  data  models 
and  mapping  models-  However  recent  work  on  database 
constrain ts[ Brod  ]  has  identified  a  restricted  set  of  constraints 
for  expressing  the  most  important  kinds  of  structures  which 
appear  in  data.  A  large  part  of  this  set  of  constructs  may  admit 
a  decidable  solution  to  the  state  mapping  recognition  problem. 

It  would  be  extremely  difficult,  if  not  humanly  impossible,  to 


(Ill, 4)  84 


directly  construct  algorithms  for  a  powerful  schema  language. 
However,  by  starting  with  the  most  basic  constraint  types,  it 
should  be  possible  to  get  decidable  necessary  and  sufficient 
conditions  for  state  mappings. 

The  ba sic  constraints  we  start  with  in  this  chapter  are 
functional  dependencies  within  a  relational  model.  Functional 
dependencies  are  one  of  the  most  basic  structures  for  database 
relations,  and  they  have  been  studied  e xtensi vely (e. g. Bern76 ]) . 

This  chapter  has  four  sections,  and  it  proceeds  as  follows: 

In  the  first  section  we  present  the  definitions  of  the  relational 
data  model  and  the  structure  mapping  model.  The  analogies  noted 
in  the  previous  chapter  allow  these  definitions  to  proceed  very 
smoot  hly. 

In  Section  2  we  state  the  problem  we  are  dealing  with: 
determining  if  a  given  structure  mapping  is  actually  a  state 
mapping.  As  we  have  noted  in  Chapter  2,  this  is  a  very  basic  and 
important  property  for  a  structure  mapping  to  have.  We  reduce 
this  problem  to  the  problem  of  calculating  valid  constraints  on 
relational  algebra  expressions.  Then  comes  the  task  of 
formulating  the  rules  to  solve  this  problem. 

Section  3  shows  that  the  given  rules  are  sound,  that  is,  that 
they  do  not  say  a  constraint  is  valid  (or  a  mapping  is 
consistent)  when  it  is  not. 

Section  4  provides  concluding  remarks. 

4. 1.  A  Eolation a  1  Data  Model  and  Mapping  Model 

In  this  section  we  will  describe  the  static  part  of  the  data 
model  RDm=  (RSch , RStr , RS t , BQ)  to  be  studied.  (The  set  RQ  of 
operations  will  be  defined  in  the  next  chapter.)  We  first  define 
the  set  RSch  of  schemas: 

Definition  4.1..  A  schema  consists  of  a  set  of  relation 
declarations  of  the  form  'name:  (degLee)  * ,  a  set  of  functional 
dependency  constraints  and  a  set  of  ecue^ity  constraints.  A 
functional  dependency  (FD)  has  the  form  'name:Z->A',  where  'Z'  is 
a  (possibly  empty)  set  of  positive  integers,  and  'A'  is  a 
positive  integer.  An  equality  constraint  (EQ)  either  has  the 
form  ' name:  X=Y'  (DEQ)  or  the  form  'name^Y*  (V EQ)  ,  where  X,  Y 


(II I  r  4)  85 


and  V  are  positive  integers.  X,  Y ,  Z  and  A  refer  to  relation 
domains,  and  V  denotes  a  data  value..  When  R  is  a  relation  name 
declared  in  a  schema  s,  we  will  write  R  6  s. 

D 

At  this  point  it  may  not  be  clear  why  we  would  want  such  EQ 
constraints  in  our  model.  After  all,  whenever  there  are  EQs, 
projections  can  be  applied  to  remove  some  domains  without  losing 
any  information.  However,  we  will  see  that  the  mappings  we 
ccnsj der  require  the  ECs  to  be  present. 

A  schema  corresponds  to  a  theory  of  Chapter  3.  The  set  of 
relation  declarations  corresponds  to  the  language  of  the  theory, 
and  the  set  of  constraints  corresponds  to  the  nonlogical  axioms 
of  the  theory.  The  set  of  positive  integers  is  considered  to  be 
the  set  of  constant  symbols  of  the  language.  That  is,  in  a  V EQ 
such  as  R:2='3*,  the  *3*  is  a  constant  symbol. 

Attributes  are  identified  by  number  (the  position  in  the 
relation)  rather  than  by  name.  If  we  named  attributes,  as  is 
commonly  done  in  many  data  models,  there  would  be  problems  with 
naming  joins.  (We  do  not  assume  a  universal  relation[ AhEU].)  For 
example,  given  a  relation  Emp  (E#, Addr,Mgr #)  ,  the  join 
Emp[ M gr#=E#  ]Emp,  which  might  be  needed  in  order  to  find  addresses 
of  managers  of  givon  employees,  cannot  have  its  attributes  named 
by  the  names  of  the  attributes  of  its  component  relations  since 
this  would  result  in  duplicate  names.  In  our  notation,  we  would 
declare  the  relation  Emp  (3)  and  would  write  the  join  as 
Emp[3=1  ]Erap.  The  domain  "names"  of  this  join  are  1,2  ,3, 4, 5  and 
6. 


Now  let  us  define  the  set-function  RStr  of  structures. 

Definition  4.2.  Let  s  be  a  schema  in  RSch.  A  structure  str  for 
s,  i. e. ,  an  element  of  RStr (s)  consists  of  a  set  U,  the  "data 
universe",  a  sequence  (c  (i)  :  i=0,1,2,...)  and  a  sequence 
(R(str)  :  R  e  s)  such  that  if  R  was  declared  in  s  with  degree  n, 
then  R(str)  is  a  finite  set  of  n-tuples  over  U.  Each  c(i)  is  an 
element  of  U  and  interprets  the  constant  symbol  i. 


□ 


(Ill, 4)  86 


To  continue  the  comparison  with  Chapter  3,  we  see  that  the 
structures  defined  in  this  section  correspond  to  the  structures 
of  the  previous  cnapter.  Including  interpretations  of  constant 
symbols  may  seem  artificial,  but  it  is  necessary  to  maintain  the 
parallel  with  Predicate  Calculus  for  future  undecidability 
proof  s. 

Given,  now,  a  schema  s  with  a  structure  str,  we  can  define 
what  it  means  for  an  FD  or  E Q  to  be  satisfied  or  true  in  str. 

Definition  4.3.  Let  s  €  ESch  and  str  6  EStr(s).  A  functional 
dependency  R:Z->A,  where  R  G  s,  is  true  or  satisfied  in  str  if 
the  following  holds:  For  ever/  pair  of  tuples  tl,  t2  in  E(str), 
if  1 1  [  Z  ]=t  2[  Z  ],  then  t1[A]=12[A].  When  Z=/zJ,  this  means  that 
t1[A]=t2[4]  for  all  tl,  t2  €  R  (str)  .  (Erackets  are  domain 
selectors.)  An  equality  R:X=Y  is  true  in  str  if  t[X]=t[Y]  for 
every  t  €  R(str),  and  an  eguality  E :X=V  is  true  in  str  if 
t[X]=c(V)  for  every  t  G  R(str).  We  will  often  simply  write  this 
last  eguality  as  t£X]=V,  assuming  that  positive  integer  symbols 
interpret  themselves. 

D 

The  parallel  with  the  definitions  of  Chapter  3  continues  as  we 
define  the  set-function  ESt  of  states- 

Def inition  4-4.  Let  s  €  ESch.  A  structure  str  G  EStr(s)  is  a 
state ,  i.e.,  an  element  of  ESt(s)  ,  if  every  constraint  in  s  is 
true  in  str.  An  FD  R:Z— >A  or  an  EQ  E:X-Y  or  E:X=V,  where  E  €  s, 
is  valid  in  s  if  it  true  in  every  state  of  s. 

0 


The  states  of  this  chapter  correspond  to  the  models  of  a 
theory  defined  in  Chapter  3.  Validity  of  constraints  corresponds 
to  validity  of  formulas  in  a  theory. 

We  have  now  defined  the  first  three  components,  RSch,  EStr  and 
ESt,  of  our  data  model  RDm.  We  can  next  define  the  state 
component  RMs  of  a  relational  mapping  model  EMm= (EMs, EMg) . 

In  Chapter  3,  structure  mappings  were  defined  by  sets  of 
defining  axioms..  The  defining  formula  in  such  an  axiom 


(Ill, 4)  87 


corresponds  to  a  database  query.  A  natural  choice  for  a  guery 
language  on  our  model  R Dm  is  relational  algebra[Codd72a  ].  This 
language  is  well  understood  and  powerful.  It  also  has  a  simple 
recursive  syntax  which  makes  it  attractive  because  it  will  lend 
itself  to  proofs  by  mathematical  induction.  We  will  first  gi ve 
some  definitions  for  relational  algebra  and  then  use  these  to 
define  raappings- 


Def inition  4.5.  A  relation  algebra  expression  is  defined  by  one 
of  the  following  syntaxes: 


£i] 

e  : 

:=  name 

|  e[X]  j  e[X=Y] 

e®e 

J  e+e  ]  e— e 

[ii] 

e  : 

:=  name 

1  e[X]  J  e[ X=Y  ] 

efie 

|  e+e 

[in] 

e  : 

:=  el 

el 

::=  e2  j 

el  +  el  i  e 1[ X  ] 

e2 

: :  =  e3  j 

e2[  X=Y] 

e3 

: :=  e4  | 

e3®e3 

e4 

:  :=  name 

|  e4[ X= V  ] 

[IV] 

e  : 

:=  el 

el 

::=  e2  J 

el+el  j  e  1[  X ] 

e2 

::=  e3  | 

e2®e2 

e3 

: : -  name 

|  e  3[  X=  V  ] 

e£X=V]  i 
e[  X=  V  1  | 


If  s  €  RSch,  we  will  let  Exp(s)  denote  the  set  of  relational 
algebra  expressions  defined  on  relation  names  in  s.  The  degree 
deg(e)  of  an  expression  e  (with  respect  to  a  schema  s)  can  also 
be  defined: 

[1]  deg(R)  =  n,  where  R(n)  is  a  declaration  in  s 

[2]  deg(e[X])  =  the  size  of  the  list  X 

[3]  deg(e[X=Y])  =  deg  (e) 

[4]  deg (e[  X=V  ])  =  deg  (e) 

[5]  deg(e1®e2)  =  deg(el)  +  deg(e2) 

[6]  deg(e1+e2)  =  deg(el)  =  deg(e2) 

[7]  deg(e1-e2)  =  deg  (el)  =  deg(e2) 

0 

The  operations  used  are  projection  e[X],  restriction  e[X=Y], 
selection  e[XfV],  cross  product  e1fie2#  union  e1+e2  and  difference 


(Ill,  4)  88 


el— e2.  In  a  restriction  e[X=Y]  or  a  selection  e[XfV],  X,  Y  and  V 
are  formally  single  items;  however  for  notational  convenience  we 
will  sometimes  abbreviate  expressions  such  as  E[1=2][4=5]  and 
R[  1  =  »  0 *  ]f  2=  *  8 '  ]  by  R[  1,4=2, 5]  and  R[  1 , 2=  *  0 , 8 '  ] ,  respectively. 
These  operators  have  the  usual  meaning  which  is  formally  defined 
in  the  next  definition. 

Syntax  I  is  the  most  general  syntax.  Syntax  II  does  not 
contain  the  set  difference  operator..  Syntax  III  does  not  contain 
the  set  difference  operator  and  restricts  the  operations  to  a 
certain  order.  Syntax  IV  has  the  same  ordering  restrictions  as 
syntax  III  and  omits  restriction  as  well  as  set  difference. 

The  next  definition  extends  the  notions  of  truth  and  validity 
to  constraints  on  expressions,  and  it  specifies  the  semantics  of 
the  relational  algebra  expressions. 

Definition  4.6.  Let  s  €  RSch,  str  -  RStr(s)  and  e  6  Exp(s).  The 
value  e  (str)  of  e  in  str  is  defined  as  below  by  induction  on  the 
length  of  e: 

[1]  R  (str)  as  in  Definition  4.2 

[2]  e[  X  ]  (str)  =  {t[X]  :  t  0  e(str)} 

[3]  e[  X=Y  ]  (str)  =  {t  :  t[X]=t[Y]  Z  t  6  e(str)} 

[4]  e[  X=V  ]  (str)  =  {t  :  t[X]=V  6  t  6  e  (str)  } 

[5]  (e1«e2)  (str)  =  {t18t2  :  tl  €  el  (str)  Z  t2  €  e2(str)}, 
where  t1fit2  denotes  the  concatenation  of  tl  and  t2 

[6]  (e1*e2)  (str)  =  {t  :  t  €  el  (str)  or  t  €  e2  (str)  ) 

[7]  (e  1— e  2)  (str)  =  (t  :  t  €  el  (str)  Z  t  ^  e2  (str)  ) 

If  e  0  Exp(s)  has  degree  n,  and  c  is  a  constraint  involving  only 
domain  names  1  through  n  (i.e„,  if  c  is  meaningful  for  e) ,  then 

e:c  is  true  in  e  if  c  is  true  in  e  (str)  ;  i. e. ,  an  FD  e:Z— >A  is 

true  in  str  if  for  all  tuples  t1,t2  6  e  (str)  ,  t1[Z]=t2[Z]  implies 

t1[A]=t2[A];  a  DEQ  e:X=Y  is  true  in  str  if  for  all  tuples  t  6 

e(str),  t[X]=t[Y],  and  a  VEQ  e:X^V  is  true  in  str  if  for  all 
tuples  t  0  e(str),  t[X]=V.  The  constraint  e:c  is  valid  in  s  if 
e:c  is  true  in  st  for  every  state  st  0  RSt(s). 

0 


We  give  below  some  examples  of  these  operations: 


(Ill, 4)  89 


B  S 


0 

1 

2 

4 

5 

6 

0 

1 

1 

1 

2 

3 

3 

2 

2 

0 

1 

1 

R[  1 

,2] 

E[  2 

=3] 

R[  1  =  *0  '  ] 

0 

1 

0 

1 

1 

0  1  2 

3 

2 

3 

2 

2 

0  1  1 

ms 

R+S 

I.-S 

0  1 

2 

4 

5 

6 

0 

1 

2 

0  1  2 

0  1 

2 

1 

2 

3 

0 

1 

1 

3  2  2 

0  1 

2 

0 

1 

1 

3 

2 

2 

0  1 

1 

4 

5 

6 

4 

5 

6 

0  1 

1 

1 

2 

3 

1 

2 

3 

0  1 

1 

0 

1 

1 

3  2 

2 

4 

5 

6 

3  2 

2 

1 

2 

3 

3  2 

2 

0 

1 

1 

We  have  not  included  three  other  traditional  operations  of 
relational  algebra,  namely,  join,  intersection  and  division. 

These  operations  are  not  needed  because  they  can  be  derived  from 
the  given  ones.  Expressions  defining  these  operations  are: 

e  1  [ X=Y  ]e2  =  (e  1fie2)  [  X= Y '  ], 

where  Y*  is  Y  displaced  by  the  number  of  domains  in  el:  A  join 
is  a  restriction  of  a  cross  product., 

3l«e2  =  (e1[D1  =  D2]e2)[ D1  ], 

where  D1  is  1 , 2 ,  .  . .  ,deg  (e  1 )  and  D2  is  1 , 2  , . .  .  ,  deg  (e2)  :  An 
intersection  is  a  projection  on  the  first  half  (or  any  other 
equivalent  set  of  domains)  of  a  join  where  the  join-condition 
equates  all  corresponding  domains  of  el  and  e2. 

e  1  [  X/Y  ]e2  =  e1[  X- ]- (  (e  1[  X- ]Se2[  Y  ])  ) -el)  [  X~  ] , 

where  X*  denotes  the  domains  of  el  not  in  X:  Division  can  be 
defined  using  cross  product,  difference  and  projection. 

We  are  now  ready  to  define  the  relational  structure  mappings. 

Definition  4.7.  Let  si,  s2  €  F. S c h .  A  structure  mapping  m  € 
Rfls(s1,s2)  is  a  set  of  equations  of  the  form  R=e,  where  E  €  s2 
and  e  €  Exp (si),  where  exactly  one  such  equation  appears  in  m  for 
each  E  €  s2.  The  function  ES tr  (s  1)  — >P,Str  (s 2)  on  structures 
determined  by  m  is  defined  as  follows:  If  strl  6  EStr(sl),  then 
str2=m  (strl)  consists  of  th3  universe  U  of  strl,  the  same 


(Ill, 4)  90 


sequence  (c(i)  :  i=0,1,2,.-.)  of  constants,  and  the  sequence 
(R(str2)  :  R  €  s2)  of  tuple  sets,  such  that  for  each  R  €  s2  with 
R=e  in  m,  R  (str2)  =e  (str  1)  . 

□ 

This  definition  directly  parallels  the  one  given  in  Chapter  3 
for  structure  mappings  between  theories.  The  eguatioas  R=e 
correspond  to  the  defining  axioms  P  (v 1 , . . . , vn) =d.  The  equations 
R  (str 2)  =e  (str  1 )  correspond  to  the  equations  P=[d,S1]. 

4.2.  Determining  State  Mappings 

Having  presented  the  necessary  definitions,  we  can  now  express 
the  main  problem  to  be  dealt  with  in  this  chapter: 

Problem  1.  Given  si  and  s2  in  RSch  and  m  in  RMs(s1,s2), 
determine  if  m  is  or  is  not  a  state  mapping. 

In  Chapter  3  we  saw  that  the  semantic  problem  of  determining 
state  mappings  was  equivalent  to  a  syntactic  problem  of  proving 
certain  theorems.  The  syntactic  problem  was  to  take  the 
nonlogical  axioms  (constraints)  of  the  view  theory,  translate 
them  according  to  the  defining  axioms  (mapping  definition)  into 
formulas  of  the  underlying  theory  (base  schema)  and  to  prove 
these  formulas  from  the  axioms  (constraints)  of  the  underlying 
theory.  We  would  like  to  be  able  to  do  the  same  thing  here  — 
but  with  decidable  proof  procedures.  Since  there  are  many 
decidable  problems  relating  to  functional  dependencies  (see  e.g., 
[ EeFH  ]) ,  it  is  reasonable  to  think  that  Problem  1  above  may  also 
be  decidable. 

Before  we  discuss  inference  rules  for  our  schema  languaqe,  we 
will  reduce  Problem  1  to  the  precise  proof  problem  which  we  will 
need  to  solve. 

According  to  Theorem  3.6,  we  must  form  the  translation  into  si 
of  the  constraints  in  schema  s2  (the  view  schema) .  This  process 
for  RDm  is  much  simpler  than  for  Predicate  Calculus  formulas: 

Each  s2-constraint  is  of  the  form  P:c  (where  c  is  Z— >A,  X=Y  or 
k=V) ,  and  there  is  some  equation  of  the  form  R=e  in  the  mapping 


(Ill, 4)  91 


m.  Therefore  the  translation  of  R:c  into  si  by  m  is  simply  e :c. 
We  state  below  (without  proof)  the  analog  of  Theorem  3.6: 

Theorem  4.1.  let  si,  s2  €  RSch  and  ®  0  RMs(s1,s2).  Then  m  is 
consistent  if  and  only  if  for  each  constraint  R:c  of  s2,  e:c  is 
valid  in  si,  whore  F.=e  is  in  m. 

D 

With  this  theorem,  Troblem  1  is  reduced  to  the  following  one: 

Problem  2.  Given  s  e  FSch  and  e  6  Exp  (s)  ,  determine  whether  or 
net  e:c  is  valid  in  s,  where  c  is  a  functional  dependency  or  an 
eguality  constraint. 

Much  of  what  follows  in  this  chapter  will  concern  itself  with 
the  above  problem  of  determining  the  validity  of  constraints  on 
relational  algebra  expressions. 

At  the  beginning  of  this  chapter,  we  stated  that  we  were 
primarily  interested  in  functional  dependencies  as  a  basic 
constraint  type;  yet  we  also  included  eguality  constraints  in  the 
schema  language.  Before  we  begin  discussion  of  inference  rules, 
we  will  given  some  informal  examples  of  why  EQs  are  needed: 

Consider  the  following  example: 

s:  R (2)  ,  S (2) 

R: 1— >2 ,  S: 1— >2 
e:  R[  2=  1  ]S 

In  e  the  FDs  1->2  and  3— > 4  are  valid.  Because  domains  2  and  3, 
which  are  the  joined  domains,  are  always  egual  in  e,  we  can 
follow  the  FD  1— >2  and  then  3— >4  to  get  a  valid  FD  1->4  in  e.  In 
order  to  be  able  to  formally  derive  this  FD,  we  need  domain 
equalities  (DEQs)  as  a  constraint  type. 

Next  consider  the  example: 

s:  R  (3)  ,  R: 1 , 2— >3 
e:  R[  2  =•  8*  ] 

In  this  example,  states  for  the  expression  e  will  always  have  a 
value  *8*  in  domain  2-  Thus,  given  a  value  for  domain  1,  the 
value  of  domain  2  is  determined,  and  therefore  domain  3  is 


(Ill, 4)  92 


determined  by  the  FD-  Hence,  the  FD  1->3  is  valid  in  e.  In 
crder  to  be  able  to  formally  derive  this  FD,  we  need  value 
equalities  (VEQs)  as  a  constraint  type. 

We  now  proceed  to  derive  inference  rules  for  these 
constraints.  It  is  natural  to  develop  these  rules  in  two  steps. 
First,  we  will  need  rules  describing  the  derivation  of  new 
constraints  from  old  ones  on  the  same  expression.  For  example, 
regardless  of  what  the  expression  is,  we  can  derive  1->3  from 
1->2  and  2->3.  The  second  group  of  rules  will  tell  how  to  derive 
constraints  on  an  expression  from  the  derivable  constraints  on 
its  component  subexpr es sions.  For  example,  from  E: 2— >3  we  can 
derive  1->2  on  the  projection  R[2,3]. 

We  will  first  discuss  constraints  on  single  expressions. 

We  assume  that  a  schema  s  is  given  and  that  e  €  Exp(s).  We 
first  note  informally  that  there  are  only  a  finite  possible 
number  of  valid  constraints  on  e.  For  if  the  degree  of  e  is  n, 
there  are  only  n*(2**n)  possible  FD  statements  Z->A  and  n *n 
possible  DEQ  statements  X=Y„  There  are  an  infinite  number  of 
possible  VEQ  statements  X=V,  but  since  the  schema  will  contain 
only  a  finite  number  of  constants,  there  are  really  only  a  finite 
number  of  VEQs  we  could  possibly  derive.  This  finiteness 
property  means  that  the  inference  rules  can  be  expressed  in  terms 
of  "the  set  of  all  derivable  constraints  is  rather  than  in 

terms  such  as  "constraint  c  is  provable  if....".  This  is  mainly  a 
notational  convenience. 

In  this  case,  we  can  express  the  problem  of  deriving 
constraints  on  a  single  expression  as  follows: 

Problem  3.  Suppose  s  €  SSch  and  e  €  Exp(s).  If  S  is  a  set  of 
constraints  on  e,  derive  the  set  of  all  constraints  whose 
validity  is  implied  by  the  validity  of  the  constraints  in  S. 

When  only  FDs  are  considered,  complete  sets  of  rules  already 
exist[ Arms  ].  The  following  three  rules  are  one  such  set: 

[ A 1 ]  X— >X  (reflexivit y) 

[A2]  X— >Z  =>  X+Y— >Z  (augmentation) 

[A3]  X— >Y  •  Y+Z— >W  =>  X«-Z— >W  (pse udotransit iv it y) 


(Ill,  4)  93 


This  means  that  if  the  set  S  consists  only  of  FDs,  the  closure 
Cl  (S )  of  S  —  all  the  FDs  derivable  from  the  those  in  S  —  can  be 
defined  by  the  following  inductive  rules: 

[0]  X— >Y  €  S  =>  X— >Y  €  C1(S) 

[ 1  ]  X->X  6  Cl (S) 

£2]  X->Z  0  C1(S)  =>  X+  Y— >Z  €  Cl(S) 

[3]  X— >Y  ,  Y  +  Z— >W  €  C 1  ( S)  =>  X+Z— >W  6  C1(S) 

That  the  rules  are  sound  and  complete  means  that  C1(S)  is 
exactly  the  set  of  FDs  whose  validity  follows  from  assuming  the 
validity  of  the  constraints  in  S.  See  [3ern75]  for  examples. 

Now  suppose  that  S  also  certains  EQs. 

The  rules  for  generating  Cl  (S)  must  include  ones  reflecting 
the  basic  properties  of  equality.  Everything  equals  itself; 
hence,  a=X  will  always  be  in  C1(S).  The  constraint  X=  f  implies 
the  constraint  Y=X.  Also  X=Y  and  Y=Z  imply  X=Z. 

VEQs  also  interact  with  DECs:  If  X=Y  and  Y=V  are  constraints, 
then  the  X-domains  are  constant  and  also  equal  V,  i.e.,  X=V  is  a 
valid  constraint.  If  X=V  and  Y=V  are  derivable,  then  so  is  X=Y. 

EQs  also  interact  with  FDs.  If  X=Y  holds  and  X  appears  on  the 
left-  or  right-hand-side  of  an  FD,  then  the  FD  with  X  replaced  by 
Y  should  also  hold.  If  X=V  holds  and  X  appears  on  the  left-hand- 
side  of  an  FD,  then  since  the  X-domain  is  constant,  we  can  drop  X 
from  the  FD„ 

From  this  discussion  we  see  that  we  should  include  rules  such 

as: 


[i] 

Z— >A  ,  X=  Y 

6 

Cl  (S) 

=> 

(Z-X)  ♦  Y— >A  0  Cl  (S) 

[ii] 

Z— >A,  A=B 

0 

C1(S) 

=> 

Z— >B  €  Cl  (S) 

[iii] 

Z— >A,  X=V 

e 

C1(S) 

=> 

(  Z—  X)  — > A  0  C1(S) 

However,  we  will  use  the  following  two  rules: 

[a]  X=Y  €  C1(S)  =>  X— > Y  0  Cl(S) 

[b]  X=V  €  C1(S)  =>  pf—>X  0  Cl  (S) 

With  these  two  rules,  we  can  drop  the  reflexive  rule  [1]  above, 
and  we  can  also  derive  rules  £i],  £ ii  ]  and  [iii]  using 
transitivity. 


(Ill,  4)  94 


There  is  another  kind  of  interaction  which  is  more  difficult 
to  detect-  This  interaction  could  never  occur  when  calculating 
constraints  on  a  single  relation,  and  much  of  this  part  of  the 
thesis  is  devoted  to  solving  this  difficult  problem.  To 
illustrate,  consider  the  following  example: 

s:  R (2) ,  E: 1— >2 
e:  F.  [  1  =  1  ]B 

For  this  expression,  the  set  S  will  contain  the  FDs  1— >2  and 
3->4,  and  there  is  also  the  EQ  1=3.  But  the  two  FDs  are  really 
the  ''same"  since  they  arise  from  the  same  FD  on  the  same  base 
relation-  Because  the  left-hand- sides  of  the  FDs  are  eguated  by 
the  EQ,  the  right-han d- sides  must  also  be  egual,  i.e.  ,  the  EQ  2=4 
must  also  hold.  In  nore  general  terms,  whenever  one  or  more 
relations  appear  as  components  on  both  sides  of  a  join,  some  of 
the  FDs  on  the  join  might  actually  be  the  "same". 

The  following  is  a  slightly  more  complicated  example: 

s:  R  (3)  ,  S (3)  ,  T (3) 

E: 1— >  2 

e:  (B«S)  [  1 , 2  ][  1=6  ]  (S[  2=1  ]R) 

In  the  expression  e,  the  (same)  FD  on  E  appears  as  1— >2  and  as 
1->7.  Our  formal  rules  must  be  able  to  recognize  these  FDs  as 
having  the  same  origin  in  E  sc  that  we  can  generate  the  EQ  2=7  on 

e. 


A  similar  thing  happens  with  unions.  This  will  be  illustrated 
short  Iv¬ 
in  order  to  tell  when  two  FDs  have  the  same  origin,  they  need 
to  be  given  labels  or  identifiers,  so  that  when  two  labels  are 
equal  (or  "essentially"  egual  —  as  explained  below) ,  the  two  FDs 
are  one  and  the  same.  Note  that  the  "origin"  of  an  FD  in  a 
relational  expression  can  be  tree-like.  The  tree  structure  is 
net  only  due  to  the  axioms  for  FDs,  especially 
pseudotransitivity,  but  also  to  the  tree  structure  of  the 
relational  expressions  themselves-  In  order  to  understand  what 
these  tree  structures  are  like  and  how  they  relate  to  the 
relational  expressions,  we  will  present  some  examples. 


(Ill, 4)  95 


Consider  the  following  example: 

s:  R  (4 )  .  S  (2) 

R: 1— >2,  R : 2 , 3— >  4 
S: 1->2 
e:  R[ 4=  1  ]s 

The  FD  1,3— >6  will  be  derivable  in  e  by  rules  we  will  present 
shortly.  To  see  what  the  label  of  this  FD  is,  we  will  make  an 
informal  derivation: 

In  the  join  R[ 4=1 ]S,  the  domain  names  of  S  have  been  shifted 
by  four  units  (the  length  of  £)  .  So  the  FD  S : 1— > 2  in  the  join 
can  be  represented  as: 


6 — 2 — S — 1—5, 


where  the  arcs  6 —  and  — 5  represent  the  relabelling.  (In  this 
and  in  the  following  examples,  the  roots  of  trees  will  be  either 
at  the  left  or  at  the  top.)  The  EQ  constraint  4=5  (which  gives 
an  FD  4— >5)  is  valid  in  the  join  R[4=1]S  because  of  the  join- 
condition,  and  so  we  may  add  the  arc  5 — >4  to  the  above  FD  label, 
obtaining: 


6 — 2 — S —  1 — 5 — 4 


as  a  label  for  the  FD  4— > 6  which  is  valid  in  the  join-  The  FD 
R:2,3— >4  is  not  renamed,  so  its  label  in  the  join  is  the  tree: 


4 — R 


The  FD  R: 1— >2  combines  with  R:2,3->4  by  transitivity  yielding 
1,3— >4,  whose  label  is 


We  again  apply  transitivity  to  derive  in  the  join  the  FD  1,3->6. 
This  FD  can  be  represented  by  the  following  tree  which  is 
obtained  by  merging  the  trees  for  1,3— >4  and  4— >6 : 


(Ill, 4)  96 


When  selections  occur,  there  are  special  nodes  which  appear  in 
the  label..  For  example: 

s:  R  (3)  ,  B :  1 , 2— > 3 
e:  R£  2=*  8*  ] 


As  we  have  mentioned  before,  the  FD  1— >3  is  valid  in  this 
expression.  Its  label  should  be  the  following: 


The  next  example  illustrates  the  case  when  the  label  may  have 
more  than  one  leaf  with  the  same  domain  name: 


s:  Ri4),  S  (2) 

R: 1 , 2, 3— >4 
S: 1— >2 

e:  (R[  1=2  ])  [  1  =  2  ]S 

In  e,  we  will  have  the  FD  1,3->4  which  results  from  the  FD 
E:1,2,3— >4  and  the  restriction  on  P.  in  e.  Its  label  will  reflect 
the  restriction  equating  domains  1  and  2: 

r1 

4 — R - 1—2 — 1 

1—3 

The  EC  from  the  join-condition,  1=6,  will  combine  with  the  FD 
S :  1— >2,  which  in  the  join  is  5— >6,  to  give  the  FD  5— > 1  with  the 
label : 


1 — 6 — 2 — S — 1 — 5- 

Composing  the  FDs  1,3— >4  and  5— >1  in  e  to  produce  3,5— >4,  will 
correspond  to  merging  the  previous  two  trees  at  the  '1*  nodes. 
Since  there  are  two  terminal  s 1 •  nodes  in  the  tree  for  1,3— >4, 
two  copies  of  the  tree  for  5— >1  must  be  joined: 


(Ill, 4)  97 


I  1  ®  2  S  1  5 

4 — r — 2 —  1 — 6 — 2 — S — 1—5 

L3 

These  examples  suggest  the  general  approach:  An  FD  Z->A  on  a 
base  relation  R  is  represented  by  itself:  a  tree  whose  root  is 
A,  whose  leaves  are  the  domains  in  Z  and  with  an  interior  node 
labelled  R.  Whenever  we  derive  a  Dew  FD  by  pseudotransitivity, 
we  "graft"  the  corresponding  labels  together  —  the  root  of 
copies  of  one  to  the  matching  leaves  of  the  other.  Whenever  a 
new  FD  is  derived  from  an  old  one  by  using  a  DEQ,  we  can  add  a 
new  label  to  the  appropriate  leaves  and/or  to  the  root.  The 
shrinking  of  the  left-hand- side  of  an  FD  by  application  of  a  VEQ 
can  be  reflected  in  the  tree  label  by  adding  to  the  appropriate 
leaves  a  terminal  node  labelled  with  the  VSQ's  value. 

The  purpose  of  the  FD  identifiers  just  described  is  to  provide 
a  means  to  tell  when  two  FDs  are  the  sa me.  However,  the  labels 
need  not  be  identical  for  the  FDs  to  be  the  same.  We  will 
briefly  discuss  t^?s  matter. 

Take  the  foil*  wing  example: 

s:  R  (2 )  ,  S  (2) 

R: 1— >2 

e:  R[ 1=1 ](S[ 1=1 ]R) . 

The  FDs  1— >2  and  1->6  in  e  have  by  the  construction  the 
respective  labels: 

2 — P—  1  and  6 — 4—-2--B — 1 — 3—1 — 3 — 1. 

The  two  FDs  are  the  same,  although  their  identifiers  are  not. 
However,  the  identifiers  differ  only  in  the  "renaming"  of  the 
nodes.  This  observation  generalizes  to  the  rule  that  if  any  two 
FDs  have  labels  which  differ  only  in  "renaming  segments",  then 
they  are  the  same  FD. 

There  are  two  other  ways  in  which  FD  identifiers  may  differ 
while  still  describing  the  same  functional  dependency.  We  will 
give  two  examples  to  illustrate: 


s:  R  (4 ) ,  R: 1 ,3— >4,  F  :1->2,  R:3->2 


(Ill, 4)  98 


S (3) ,  S : 1, 2— >3 
e;  R[ 2, 4=2,1  ]S 


In  e  there  is  the  FD  1,3— >7  which  can  be  represented  by  either  of 
the  trees: 


We  must  be  able  to  recognize  these  as  equivalent  FD  labels  even 
though  the  lower-right  portions  are  different. 


The  second  example  is: 


s:  E  (4)  ,  R : 1 , 2— > 3 ,  R:1,4->2,  R:1,4->3 
e:  (R[ 2=2 ]R) [ 1 ,5='0,0»  ] 


In  e,  the  FDs  4— >7  and  4— >3  can  be  associated,  respectively,  with 
the  trees: 


4 

4 

r  i 

•i’  A 

A 

i 


•A* 


We  must  be  able  to  "collapse"  the  identifier  on  the  left  and 
recognize  it  as  equallinc  the  one  one  the  right. 


(Ill, 4)  99 


We  now  make  the  preceding  discussions  precise  with  some 
definitions: 

Definition  4*8*  Let  s  €  RSch  be  a  schema.  The  set  FDid(s)  of  FD 
identifiers  is  the  set  of  unordered  trees  defined  inductively  as 
follows: 

[1]  If  R:  Z 1.  . . Zn—>A  is  in  S,  then  the  tree  with  root  segment 
E— >A  (A  is  the  root)  and  arcs  Zi— >R ,  for  i=1,..„,n,  is  in 

FCid  (s) . 

[2]  If  id  €  FDid(s)  and  some  leaf  nodes  of  id  are  labelled  X, 
then  the  tree  obtained  from  id  by  attaching  to  every  such  node, 
the  arc  Y— >,  where  Y  is  some  positive  integer,  is  also  in 

FCid  (s)  . 

[3]  If  id  ®  FDid(s),  then  attaching  an  arc  — >X  out  of  the 
root  of  id  produces  another  element  of  FDid  (s) . 

[4]  If  id  1 ,  id2  6  FDid(s)  ,  and  id2  has  a  root  labelled  X,  and 
idl  has  some  leaves  labelled  X,  then  the  tree  obtained  by  merging 
the  root  of  a  copy  of  id2  to  each  leaf  of  idl  labelled  X,  is  also 
in  FDid (s)  . 

[5]  If  idl  €  FDid  (s)  ,  and  some  of  the  leaves  of  idl  are 
labelled  X,  then  by  attaching  to  each  of  these  leaves  the 
"literal”  arcs  1 v*—>,  where  v  is  some  positive  integer,  we  obtain 
another  element  of  FDid  (s) . 

D 

Now  we  associate  FD  identifiers  with  FDs  themselves  through 
the  mechanism  of  the  labelled  FD: 

Definition  4.9.  A  labelled  FD  is  a  statement  of  the  form 
e:id:X->Y,  where  id  €  FDid(s)  and  e:X— >Y  is  an  FD"  as  previously 
defined.  When  e  is  understood,  we  will  write  id:Z->A«. 

D 

As  we  have  seen,  we  need  to  be  able  to  reduce  an  FD  identifier  to 
a  "canonical"  form  so  that  two  FD  identifiers  can  be  compared  for 
equivalence.  The  next  definition  specifies  a  function  to  do 
this. 


(Ill, 4)  100 


Eef initio ri  4.  10.  The  function  Rdc:  FDid  (s) -> FDid  (s)  ("Reduce")  is 
defined  to  be  the  result  of  performing  the  following  three 
operations  wherever  possible  and  as  long  as  possible  on  id: 

(i)  If  nl  and  n2  are  two  nodes  on  a  branch  such  that  nl  is 
labelled  Ly  a  positive  integer  and  n2  is  labelled  by  a  positive 
integer  or  a  literal  value  (a  guoted  positive  integer),  and  if 
there  aLe  only  integer-labelled  nodes  between  nl  and  n2,  then 
delete  all  the  intervening  nodes  in  id-  This  rule  eliminates  the 
nodes  arising  from  relabelling. 

(ii)  If  there  are  subtrees  idl  and  id2  of  id  of  the  respective 
for  in : 

A  B 


such  that  {Z1 ' ,  ,Zra' }  C  {Z1#.„.,Zn}  and  such  that  the 
respective  descendants  of  the  Zs  are  the  same,  and  if  there  is  an 

FD  R:Z1» Zkn— >B  such  that  {Zl Zk"}  £  £Z1,...,Zn)  and 

ZV'.-.Zk"  <  Zl'.-.Zm4  according  to,  say,  a  lexicographic 
ordering,  then  replace  id2  by  id3: 


where  the  descendants  of  the  Z "s  are  the  same  as  the  descendants 
of  the  corresponding  Zs.  When  one  part  of  the  tree  contains 
an  Fb  R:Z— >A,  all  FDs  of  the  form  R:Z1— >B,  where  Zl  C  Z  and 
the  subtrees  of  the  Zs  are  the  same,  will  be  equivalent-  This 
rule  picks  out  a  standard  FD  from  the  several  possibilities. 


(iii)  If  there  is  a  subtree  idl  of  id  of  the  form: 


(Ill, 4)  101 


A 


such  that  {Z1  Zi~  1 , Zi +  1 ,...  Zn }  C  {Z  1  Zm'  }  and  the 

corresponding  Zs  have  the  same  descendants,  then  replace  idl  by 
the  following  subtree: 


A 


Z1 '  Zm  * 

i  1 

where  the  descendants  of  the  Zs  are  the  same  as  above.  This  rule 
simplifies  composed  FDs. 

(iv)  Finally,  if  the  cop  of  id  has  the  form  m — n — R  (rather  than 
just  n — F) ,  remove  the  arc  ra— ,  and  if  a  terminal  branch  of  id 
has  the  form  E — n — n  (rather  than  just  R — n)  then  remove  the  arc 


n. 


□ 


The  following  are  some  examples  of  using  the  F.dc  function: 


3 — 8 — 2 — S — 1—4—5  ==>  2 — S — 1  —  5 


2 — F 


3 — 1 — *6*  ==>  2 — F. — 3 — *6* 


1 — R — 2 — 3 — 4 — S — 5  ==>  1 — F — 2 — 4 — S — 5 

The  function  Rdc"  constitutes  one  part  of  our  algorithm  for 
determining  valid  constraints  on  relational  algebra  expressions. 
The  next  definition  specifies  the  inference  rules  which  we  have 
already  informally  discussed  for  constraints  on  a  single 
expression.  This  constitutes  the  second  part  of  the  algorithm. 


(Ill,  4)  102 

Def initios  4.11..  Let  s  €  RSch;  let  e  €  Exp(s)  and  let  S  be  a  set 
of  EQs  and  labelled  FDs  defined  on  e.  The  closuie  Cl  (S)  of  S  is 
defined  by  the  following  inductive  rules: 


[1] 

S  C  Cl  (S)  . 

[2] 

X=Y  €  Cl (S)  =>  Y= 

X  0  Cl  (S)o 

[3] 

X=Y ,  Y=Z  €  C1(S) 

=>  X=Z  0  Cl  (S) . 

[4] 

X=Y ,  Y=V  €  C1(S) 

=>  X=V  0  Cl  (S) . 

£5] 

X=  V,  Y=  V  €  Cl(S) 

=>  X= Y  0  Cl  (S) . 

[6] 

X=X  €  Cl (S) . 

£7] 

X  =Y  €  Cl  (S)  =>  id 

: X->Y  €  C 1  ( S) ,  where  id  is  X->Y. 

[8] 

X=V  €  Cl  (S)  =>  id 

: 0->X  €  C 1 ( S) ,  where  id  is  'V'->X. 

[9] 

idl: Z->A 1,  id2: Z— 
Cl  (S)  . 

>A2  €  C1(S)  •  Rdc  (idl)  =Rdc(id2)  => 

[10]  id  :Z— >A  €  Cl  (S)  =>  id  :Z+X— >A  6  Cl  (S) 

[11]  id  1 :  Z— >A ,  id? : A+  X— > B  €  Cl  (S)  =>  id3:Z+X->B  0  Cl  (S) ,  where 

id3  is  obtained  from  id2  by  merging  a  copy  of  idl  to  every 
leaf  of  id2  labelled  A- 


D 


We  have  specified  inference  rules  for  constraints  on  one 
expression-  Now  we  want  to  give  rules  for  generating  constraints 
on  expressions  from  constraints  derived  on  subexpressions.  First 
we  will  informant  discuss  how  this  should  be  done,  and  then  we 
will  present  the  formal  rules.. 

For  a  base  relation  E ,  we  take  the  FDs  in  the  schema  belonging 
to  E  and  label  them  by  themselves-  Then  the  closure  operator  is 
applied. 

For  projections,  two  things  come  into  consideration.  First, 
in  order  for  an  FD  or  an  E Q  to  "survive”  a  projection,  all  of  the 
referenced  domains  must  be  in  the  projection.  Second,  we  must 
rename  the  domains  by  the  order  of  their  appearance  in  the 
projection  list-  Take  the  following  example: 

s:  R  ( 1 0 ) ,  R : 2, 4 , 6->8 
el  :  R[  2,  4,  6,  6  ] 
e2:  Rl 1,2,3,4,b,8] 
e3  :  R[  8,  7,  6,  2,  4  ] 
e4:  R[  2, 6, 7, 8] 


(IT  1,4)  103 


In  the  projection  e1r  the  FD  appears  as  1,2,3— >4.  In  e2,  the  FD 
is  2,4,5->6.  In  e3,  it  is  3,4,5— >1,  and  in  e4,  there  is  no  FD. 

If  C  is  the  set  of  derivable  constraints  for  some  expression  e, 
then  to  get  the  set  of  derivable  constraints  for  a  projection 
e[X],  we  need  to  find  all  the  constraints  in  C  which  survive  the 
projection,  but  we  do  not  need  to  apply  the  closure  operator. 
(Projection  is  like  an  algebraic  homomorphism;  the  result  is 
already  closed  under  the  operations.)  The  reason  is  that  C  is 
already  closed,  and  if  one  or  more  constraints  in  C  survive  the 
projection  and  also  combine  according  to  one  of  the  rules 
[ 1 }-[  1 1 ]  defining  closure,  then  the  resulting  constraint  will 
also  survive  projection  because  every  domain  in  the  result 
appears  in  at  least  one  of  the  original  constraints., 

Nctationally,  the  condition  for  Z— >A  to  be  valid  for  e[  X  ]  is  that 
X[Z]->X[A]  be  valid  in  e.  The  notation  X£Z]  is  like  array 
subscription:  An  attribute  i  is  in  X[ Z ]  if  and  orly  if  there  is 

seme  j  €  Z  such  that  i  is  in  the  j-th  position  of  X-  For 
example,  if  X  is  2, 8, 4, 6  and  Z  is  {2,4},  then  X£Z  ]  is  the  set 
{6,8}. 

In  cross  products,  renaming  also  occurs.  Given  relations  R  (m) 
and  S  (n) ,  the  domains  of  S  in  the  cross  product  RSS  have  been 
"shifted"  by  the  length  m  of  F.  That  is,  they  number  m+1  through 
ir+n.  All  constraints  of  S  are  valid  in  the  cross  product,  but 
they  have  also  been  renamed  accordingly.  We  also  ned  to  take  the 
closure* 

For  a  restriction  e[X=Y],  ve  add  the  constraint  X=Y  to  the 
ones  already  holding  in  e..  For  a  selection  e£X=V],  we  similarly 
add  X=V  to  the  constraints  of  e.  Then  the  closure  operator  is 
applied 

For  a  union  e1+e2,  a  constraint  basically  must  hold  in  both 
components  in  order  to  hold  in  e1+e2;  any  constraint  valid  on 
only  one  component  can  be  violated  by  tuples  in  the  other 
component,  and  these  violating  tuples  will  appear  in  the  union* 

Tc  get  the  FQs  valid  in  e1+e2,  we  simply  take  the  intersection  of 
the  sets  of  valid  l^Qs  for  el  and  e2.  To  calculate  the  valid  FDs, 
the  FDids  must  be  considered;  i«,e.,  "intersection"  must  be 


(Ill,  4)  104 


generalized  to  take  into  account  FD  identifiers..  Consider  the 
following  example: 

s:  R  ( 2)  ,  ST{2) 

R:  1— >2 ,  S: 1— > 2 

el-  R+S 

e2  :  E[  2='  2 1  ]  +  R[  2=*  5*  ] 

In  el,  the  FD  1— > 2  does  not  hold  even  though  the  FD  1— >2  is  valid 
in  both  components  of  the  union-  For  example,  in  the  state  st, 
where  R  (st)  =  {(0,0)}  and  S  (st)  =  {(0,1)},  the  FD s  on  the  base 
relations  are  true,  but  R  +  S  =  {  (0 ,0)  ,  (0,1) }  which  Tiolat.es  the  FD 
1— >2. 

Now  consider  the  expression  e2-  Clearly  the  FD  1— >2  will  be 
valid  in  this  expression  (since,  e.  g.  ,  R{2^*2*]  +  B.[2=,51  ]  C  R), 
This  is  because  1— >2  was  the  "same"  FD  in  each  component-  In 
general,  for  an  FD  id:Z— >A  to  appear  in  e1+e2,  it  must  be  valid 
in  both  el  and  e2  and  have  the  "same"  id  in  both  components. 

The  constraints  valid  on  a  set  difference  will  be  the 
constraints  valid  on  the  first  component.  There  will  be  at  least 
that  many  valid  constraints  because  the  difference  is  a  subset  of 
the  first  components  More  will  be  said  about  set  difference 
later. 

This  discussion  is  formalized  by  the  following  definition: 


(Ill,  4)  105 


Definition  4-12.  Let  s  e  Rsch  and  e  €  Exp(s).  The  set  Drv(e)  of 
derivable  constraints  on  e  is  defined  by  the  following  rules 
which  use  induction  on  the  number  of  operations  in  e: 

[1]  Drv(S)  =  Cl  (all  FDs  and  EQs  belonging  to  R  in  s)  ,  except 
that  each  FD  E:Z— >A  of  E  is  labelled  by  the  tree 
consisting  of  the  root  segment  E— >A  and  an  arc  d— >  into 
node  R  for  each  d  €  Z.  That  is,  even  though  an  FD  may 
have  a  complicated  derivation  from  the  constraints  in  s, 
its  identifier  will  be  a  simple  tree. 

£2]  Drv  (e[  X  ])  =  {id2:Z->A  :  id  1 :  X£  Z  ]->X[  A  ]  €  Drv(e)}  ♦  (Y=Z  : 

X[Y]=X[Z]  €  Dr v  (e)  }  ♦  {  Y=  V  :  X[Y]=V  e  Drv(e)},  where  id2 
is  obtained  from  idl  by  attaching  to  each  leaf  X[n  ]  of  idl 
the  arc  n— >,  and  to  the  root  X[A]  of  idl  the  arc  — >A. 

[3]  Drv  (e[  X=Y  ])  -  Cl  (Drv  (e)  +  £X=Y}) 

[4]  Drv  (e£  X=V  ])  =  Cl  (Drv  (e)  +  [X=  V}  ) 

£5]  Drv(el9e2)  =  Cl  (Drv  (el)  +  Drv'  (e2)  )  ,  where  Drv*(e2)  is 

Drv(e2)  renamed,  i.e.  ,  X=Y  €  Drv(e2)  implies  X+k=Y-*k  < 
Drv*(e2),  X=V  €  Drv(e2)  implies  X*k=V  €  Drv*  (e2)  and 
id1:Z— >A  €  Drv  (e2)  implies  id2:  Z+k— >A+k  €  Drv*  (c2 )  where 
id2  is  obtained  from  idl  by  adding  an  arc  d+k->  to  each 
leaf  labelled  d  and  an  arc  ->n+k  to  the  root  labelled  n, 
and  where  k=deg(e1). 

£6]  Drv(e1+e2)  =  {x  €  Dr  v  (e  1)  *Dr  v  (e2)  :  x  is  an  EQ]  ♦  (id1:Z— >A 
6  Drv(el)  :  there  is  id2:  Z— >A  in  Drv(e2)  with 
Edc(idl)  =Rdc(id2) } 

[7]  Drv  (el— e2)  =  Drv  (e  1)  « 

We  will  say  that  e  is  consistent  (with  respect  to  s)  if  for  no 
domain  X  and  for  no  distinct  values  VI,  V2  are  both  X=V1  and  X=V2 
members  of  Drv(e). 

0 

The  above  formulation  of  Drv  was  the  natural  one  in  that  it 
extended  a  known  complete, set  of  rules  for  functional 
dependencies  on  one  relation.  However,  these  definitions  prove 
awkward  in  succeeding  proofs  mainly  because  of  the  rule  for 


(111,4)  106 


augmentation  (rule  [10]  of  Definition  4.11).  When  this  rule  is 
used,  the  left-  hand-side  of  the  FD  will  not  correspond  exactly  to 
the  integer  (non- literal)  leaves  of  the  FD  identifier  since  the 
augmentation  rule  does  not  change  the  FD  identifier  while  it  does 
change  the  left-hand- side  of  the  FD.  In  the  next  definition  and 
the  next  theorem,  we  will  present  modified  versions  of  the 
closure  operator  and  the  Drv  operator  which  do  not  use 
augmentation.  We  will  show  that  there  is  no  essential  loss  of 
power  with  the  new  definitions-  These  definitions  will,  however, 
provide  stronger  ties  between  an  FD  and  its  identifier.  (Some 
ether  formulation  for  Cl  may  be  possible  which  losos  no 
information  with  augmentation;  however,  we  have  not  discovered 
one. ) 

Definition  4..  13.  Let  S  €  RSch,  e  €  Exp(s)  and  le t  S  be  a  set  of 
EGs  and  labelled  FDs  on  e-  Then  the  (modified)  closure  C11  (S)  of 
S  is  defined  as  follows: 

C11(S)  =  C1(S)  -  (Z->A  :  for  some  Z1  C  Z,  with  Z1*Z,  Z  1->A  € 

Cl  (S)  }  . 

The  set  Drvl(e)  of  (modified)  derivable  constraints  on  e  is 
defined  analogously  to  the  definition  of  Drv  (Definition  4.12) 
with  *01'  replaced  by  ’ClI*  and  'Drv1  by  'Drv1:. 

D 

In  the  above  definition  we  did  not  simply  say  that  C11  is  like 
Cl  with  rule  [10]  (augmentation)  removed  because  we  could  still 
have  derivations  such  as  the  following  cue: 

1 , 2— >3  3— >  2  =>  1 , 2— >2. 

The  next  tneorem  relates  Drv  and  Drvl.  Drvl  does  not  include 
augmentation,  but  the  theorem  says  that  all  constraints  generated 
by  Drv  can  be  obtained  from  Drvl  by  applying  augmentation  once  as 
the  very  last  operation.  The  ideas  are  simple;  the  proof, 
however,  is  a  moderately  intricate  induction. 

Theorem  4.2.  Let  s  6  F.Sch,  e  6  Exp  (s)  T  and  consider  the 
following  property  P  with  respect  to  sets  S  and  S 1  of  constraints 


on  e : 


(Ill, 4)  107 


If  c  is  an  EQ,  then  c  €  S  if  and  only  if  c  €  SI,  and 
id :Z— > A  €  S  if  and  only  if  for  some  Z1  C  Z,  Z1— >A  €  SI. 

Then  Drv (e)  and  Drvl  (e)  have  property  P. 

Proof..  The  proof  may  be  found  in  the  appendix. 

0 

Our  algorithm  now  consists  of  the  functions  Drv  (DrvlJ  ,  Cl 
(Cl 1 )  and  Rdc-  With  these  rules,  a  solution  may  be  proposed  to 
Problem  2  posed  earlier: 

Proposition.  Given  s  *=  RSch  and  e  6  Exp(s)  ,  a  constraint  e:c  is 
valid  in  s  if  and  only  if  c  fe  Drv(e). 

The  proof  of  this  proposition  has  two  parts.  First,  are  the 
rules  sound?  That  is,  are  any  invalid  FDs  or  EQs  generated? 
Second,  are  the  rules  complete?  Do  they  find  all  of  the  valid 
FDs  and  EQs?  The  question  of  soundness  is  considered  in  the  next 
section,  and  we  study  completeness  in  the  next  chapter- 

4.3.  Soundness  of  tuP  Derivation  Pales 

The  formal  development  of  the  operator  Drv  was  in  three  parts: 
First,  Rdc  and  Cl  were  defined  and  then  Drv  in  terms  of  Kdc  and 
Cl.  To  prove  the  soundness  of  the  rules,  we  first  prove  some 
properties  of  Rdc  and  Cl  and  then  properties  of  Drv. 

Functional  dependencies,  as  their  name  implies,  are  closely 
related  bo  functions.  If,  say,  in  R(n)  there  is  the  FD  Z->A, 
then  given  any  state  st,  the  projection  E[Z,A](st)  defines  a 
partial  function  U**k— >U,  where  k  is  the  number  of  domains  in  3. 
Similarly,  given  any  state,  FD  identifiers  define  partial 
functions  from  their  leaf  domains  to  their  root  domains.  It  is 
convenient  to  taKe  as  the  domain  of  these  functions  the  set  of 
infinite  sequences  of  elements  of  the  universe.  In  this  way,  all 
functions  will  have  the  same  domain.  The  function  we  associate 
with  an  identifier  id  will  then  actually  depend  only  on  the 
positions  in  the  seguence  corresponding  to  the  labels  of  the 
leaves  of  id.  Given  a  state  st ,  we  will  denote  the  function 
determined  by  an  identifier  id  by  id(st)  or  id(st;  ).  We  will 


(Ill,  4)  108 


first  define  the  functions  determined  by  identifiers,  and  then  we 
will  compare  these  functions  with  the  FDs  to  which  the  FDids  are 
attached. 

Definition  4.14..  let  s  6  RSch,  st  -  RSt  (s)  and  id  €  FDid  (s) . 

The  partial  function  id(st)  whose  domain  is  the  set  of  sequences 
of  universe  elements  is  defined  by  the  following  Algol-like 
procedure,  where  x  is  a  sequence  of  domain  values: 
id(st;x)  = 

if  id  is  an  integer  leaf  i  then  x (i) 
else  if  id  is  a  literal  leaf  *v*  then  v 
(*)  else  if  succ(id)  is  an  integer  node  then  succ  (id)  (st;x) 
else  /*  succ  (id)  is  a  relation  name  R;  root(id)  is  an 
integer  A,  and  the  set  of  children  of  R  is 
{idl , . . . , idn}  whose  roots  are  labelled  Z 1  ,.. .  .  ,  Zn  */ 
if  id  1  (st ;  x)  ,.  .  .  ,idn  (st ;  x)  are  defined 
then  begin 

v1<— id  1  (st ;  x)  ; 


□ 


,rn<— idn  (st  ;x)  ; 

if  R[  Z  1 ,. .  .  ,Zn=v  1,«  .  .  ,  vn  ][  A  ]  (st)  =  {a} 

/*  i.e.,  if  there  is  a  tuple  in  F (st)  whose 
Z-values  egual  the  v*s  */ 
then  a  /*i.  e.  ,  return  the 
unique  A-value  */ 
else  undefined 

end 

else  undefined 


Now  identifiers  were  not  associated  with  FDs  in  a  haphazard 
fashion.  They  were  chosen  so  that  the  function  determined  by  the 
identifier  essentially  equals  the  function  determined  by  the  FD. 
To  decide,  as  in  rule  [8]  of  Definition  4.11,  if  A1=A2  should  be 
inferred  from  id1:Z— >A1  and  id2:Z— >A2  is  then  a  question  of 
deciding  if  idl(st)  =  id2  (st)  for  all  states  st,  i.e.,  if  the 
functions  determined  by  the  identifiers  are  equal.  We  are  given 
as  an  hypothesis  of  rule  [8]  in  Definition  4.11  the  eguality 
Rdc(idl)  =  Rdc(id2)..  Hence,  if  we  can  show  that  the  Rdc  operator 


(111,4)  109 


does  not  affect  the  associated  function,  i. e»,  that  id(st)  = 
Bdc(id)(st)  for  all  st,  then  we  will  know  when  it  is  okay  to 
derive  A1=A2.  This  is  formally  noted  in  the  following  theorem: 

Theorem  4„3.  Let  s  S  RSch  and  id  6  PDid (s) »  Then  for  all  st  € 
RSt  (s >  ,  id  (st)  =  Rdc(id)(st). 

Proof  .  According  to  line  (*)  in  the  definition  of  id  (st)  ,  the 
presence  of  interior  integer  nodes  in  id  does  not  affect  the 
value  of  the  function.  Hence,  deleting  these  interior  nodes  will 
not  affect  the  function. 

Clause  (ii)  of  Rdc  wll  not  affect  the  value,  and  this  can  be 
seen  as  follows:  Suppose  idl,  id  2  and  id3  have  the  indicated 
form.  In  evaluating  idl,  id2  and  id3,  we  will  compute  the 
values: 

vl  for  Z 1 ,  ...  ,  vn  for  Zn , 

vl'  for  Z 1  * ,  ...  ,  vm*  for  Zm', 

vl"  for  Z1",  ...  ,  vk"  for  Zk". 

The  corresponding  v's  will  be  the  same  since  the  corresponding  Zs 
have  identical  descendants.  Suppose  id(st;x)  is  defined.  Then 
id1(st;x),  id2(st;x)  and  id3(st;x)  are  defined.  This  means  there 
is  a  tuple  t  in  E[Z1. . . Zn=v  1. .. vn  ]  (st) .  This  tuple  will  also  be 
in  E[  Z 1  •  ... .  Zm  *=v1  .  vm  *  ]  (st)  and  in  R[  Z1  .  .  Zk,r=v1  vk  "  ]  (st) . 

Therefore,  we  will  have  id2(st;x)  =  id3(st;x)  =  t[B]  and 
replacing  id2  by  id3  will  not  affect  the  value  of  id. 

Suppose  id  (st; x)  is  not  defined.  If  id1(st;x)  is  not  defined, 
then  replacing  id2  by  id3  will  not  have  any  affect.  If  id1(st;x) 
is  defined,  then  both  id2(st;x)  and  id3(st;x)  are  defined,  and 
replacing  id2  by  id3  will  still  have  no  affect  as  shown  above. 

Clause  (iii)  of  F.dc  will  not  affect  the  value,  and  this  can  be 
seen  as  follows:  let  idl  be  as  described  in  clause  (iii) ,  and 
let  id2  be  the  displayed  subtree  of  idl,  and  id3  be  idl  after  the 
change.  In  evaluating  id1(st;x),  the  following  values  will  be 
computed: 

vl  for  Z 1 ,  ...  ,  vn  for  Zn, 

vl *  for  Zl * ,  ...  ,  vm*  for  Zm*, 


(Ill, 4)  110 


where  vi=id2  (st; x) .  If  id(st;x)  is  defined,  then  id1(st;x)  and 
id2(st;x)  are  defined,  and  so  there  is  a  tuple  t  € 

F.[  Z 1  * . . .  Z  nt’^vl  *.  .  „  vm  •  ]  (st)  .  Now  vi=t[Zi]  and  from  the  hypotheses 
of  clause  (iii),  {Z1,...,Z  i—  1  ,  Zi+  1 ,  •  ,  Zn}  C  { Z 1  *  Zm®},  and 

their  corresponding  values  are  the  same.  Hence  t  € 

E[  Z1.  .  .  Zn=v1. . .  vn  ]  (st)  .  This  means  that  t[  A  ]  € 

B[  Z  1  *  .....  Z  m*  =  v  1 1  -  -  «,  7m  *  ][  A  ]  (st)  and  t[A]  € 

F[  Z1.„.  Zn=v1.  ..  vn  j[  A  ]  (st)  ,  i.  e.  ,  id  1  (st  ;x)  =id_3  (st ;  x)  .  If 
id(st;x)  is  not  defined,  then  if  idf(st;x)  is  defined,  replacing 
idl  by  id3  will  still  result  in  id(st;x)  undefined.  If  id1(st;x) 
is  undefined,  it  is  for  two  possible  reasons:  Some  vj  is 
undefined,  or  each  vj  is  defined,  but  R[  Z 1. „ . Zn=v 1. . . vn  ]  (st )  is 
empty.  In  the  first  case,  we  must  have 

K[ Z1 * . . .Zm'=v1 . . vm*  ]  (st )  empty,  and  this  implies  that  id3(st;x) 
is  undefined.  The  second  case  cannct  occur  because  we  know  that 
vi=id2(st;x)  is  defined,  and  also  that  /J  * 

K[ Z1 *„ . . Z m* =v  1 . . vm •  ]  (st)  C  E[  Z1 . ...  Zn=  vl . .vn  ]  (st)  . 


0 


When  we  go  to  prove  completeness,  it  will  be  necessary  to  know 
that  the  mapping  from  reduced  identifiers  to  the  functions  of 
Definition  4.14  is  one-to-one,  that  is,  that  different  reduced 
identifiers  correspond  to  different  functions.  We  have  included 
this  very  important  theorem  here  since  the  related  definitions 
are  closer  at  hand. 

Theorem  4.4.  Let  s  e  BSch;  let  e  €  Exp(s),  and  suppose  idl  and 
id2  are  reduced  indentifiers  which  can  be  associated  with 
(identifiers  of)  IDs  in  some  .Drvl  set.  Then  id1#id2  implies  that 
there  is  a  state  st  and  a  valuation  x  such  that 
idl  (st:x)  *  id2(st;x). 

Proof.  See  the  Appendix. 

0 

We  have  claimed  that  the  function  determined  by  an  PD  Z— >A 
defined  on  an  expression  e  is  essentially  the  same  as  the 
function  determired  by  the  associated  identifier.  The  precise 
meaning  of  "essentially  '*  is  the  following:  Suppose  a  state  st 
and  a  seguence  x  is  given.  The  values  of  x  which  are  relevant  to 


(Ill, 4)  111 


the  FO  are  x[ Z J™  The  application  of  the  function  to  the  input  x 
corresponds  to  the  selection  expression  e£Zf;x£Z]].  To  extract 
the  A-value,  the  projection  e£Z=x£Z]]£A]  is  used.  If  the 
function  is  defined  at  st  and  x,  there  is  one  element  in 
e[Z=x[Z  ]]£  A  ]{st)  ,  otherwise  it  is  empty..  The  "essential" 
equality  is  then  a  €  e[  Z=  x[  Z  ]  ][  A  ]  (st)  <=>  a=id(st;x).  We  do  not 
prove  this  equality  separately  but  together  with  the  soundness 
statement  itself. 

With  the  above  constructions,  we  can  prove  the  soundness  of 
the  closure  operator.  We  hypothesize  two  properties  of  a  set  S 
cf  constraints.  One  property  says  that  all  elements  of  S  are 
valid  constraints^  The  other  property  is  a  statement  that  FD 
identifiers  "ayree"  with  the  FDt>  as  discussed  above..  The  theorem 
then  states  that  the  closure  operator  preserves  these  two 
properties.  Of  course,  we  are  primarily  interested  in  the 
preservation  of  validity-  However,  the  second  property  is  needed 
during  the  induction  steps  and  for  the  completeness  theorem. 

Theorem  4.5.  Let  s  6  F.Sch  and  e  6  Exp(s)  ,  and  consider  the 
following  two  statements:  The  first  applies  to  any  FD  or  EQ  c  on 
e,  and  the  second  applies  to  any  labelled  FD  id:Z->A  on  e: 

(i)  c  is  valid 

(ii)  (given  id:Z->Aj,  for  every  st  6  SSt(s)  and  every  sequence 
x  of  universe  elements,  a  €  e£ Z=x[ Z  ] ]£ A ] (st)  if  and  only  if 

a  =  id  (st ;  x)  . 

Let  S  be  a  set  of  EQs  and  labelled  FDs  defined  on  an 
expression  e  such  that  (i)  holds  for  each  element  of  S  and  (ii) 
holds  for  each  FD  of  S.  Then  (i)  holds  for  each  element  of 
C11  (S)  and  (ii)  holds  for  each  FD  of  C11(S). 

Proof.  We  assume  (i)  holds  for  each  element  of  S  and  (ii)  holds 
for  each  FD  of  S. 

If  c  €  rn  (S)  is  present  by  clause  £1]  of  Definition  4.11, 
then  the  properties  hold  for  c  by  hypothesis. 

If  c  e  C11  (S)  is  present  by  one  of  clauses  £2]  through  £6], 
then  property  (i)  holds  because  of  the  basic  properties  of 
equality. 

If  id:  X->Y  €  C11(S)  because  X=Y  €  C11(S)  (clause  £7]),  than 
X->Y  is  valid  because  equality  is  functional  (and  valid  by  the 


(Ill,  4)  112 


induction  hypotnesis) -  Also,  for  any  state  st  and  sequence  x, 
a  €  e[  X= x[  X  ]  ][  Y  ]  (st )  if  and  only  if  for  some  t  €  e(st),  x[X]  = 
t[  X  ]  =  t[  Y  ]  =  a;  i.e.,  if  and  only  if  a  =  id(st;x). 

If  id:  &->X  6  Cll(S)  because  X^V  e  C11(S)  (clause  [8]),  then 
> S->X  is  valid  because  the  X  domain  is  constant-  Also,  for  any 
stare  st  and  seguence  x,  a  6  €  [j*=x[  {/]  ][  X  ]  (st)  =  e[X](st),  if  and 
only  if  a  =  V  =  id(st;x). 

Suppose  id1:Z— >A 1  and  id2: Z— >A 2  are  in  C 1 1 ( S)  and  statement 
(ii)  above  is  true  for  them  and  also  suppose  Edc  (id  1 ) =Bdc  (id2 ) . 
Then  for  every  state  st  and  any  t  6  e  (st)  ,  t[A1]  € 
e[  Z=t[  Z  ]  ][  A1  ]  (st)  ,  so  t[  A  1  ]=id  1  (st ;  x)  =  Edc  (id  1)  ( st ;  x)  and  t[  A2] 

6  e[Z=t[  Z  ]  ][  A  2  J  (st)  ,  so  t[A2]  =  id2(st;x)  =  Edc(idl)  (st  ;x)  ,  where 
x[  Z  ]=t[  Z  ]  -  Hence  t[A1]=t[A2]  and  so  A1  =  A2  is  valid  for  e- 

Suppose  id1:Z->A  and  id2:A*X— >E  are  in  Cll (S)  and  satisfy  (i) 
and  (ii),  and  chat  id3:Z+X— >E  is  obtained  per  clause  [11]-  By 
inspection  of  the  algorithm,  we  can  see  that  id3(st) ( y)  = 
id2(st)  (x[id1(st)  (x) /A  ])  (where  x[y/i]  denotes  the  sequence 
obtained  from  x  by  putting  y  at  position  i) -  By  definition,  b  € 
e[  Z+Xf  x[  Z  +  X  ]  ][  A  ]  (st)  if  and  only  if  there  is  a  t  6  e(st)  such 
that  t[B]=b  and  t[  Z+X  ]=x[  Z+X ].  This  is  true  if  and  only  if  there 
is  a  t  6  e(st)  sv  ch  that  t[B]=b  and  t[A+X]  =  x[  t[  A  ]/A  ][  A+ X  ]  and 
t[Z]=x[Z].  By  induction,  we  know  that  b  € 
e[  A  +  Xf  x[  t[  A  ]/A  ][  A+  X  ]  ][  B  ]  (st )  if  and  only  if  b  = 

id2  (st ;  xf  t[  A  ]/A  ])  ,  and  that  t[  A  ]  €  e[  Z^x[  Z  ]  ][  A  ]  (st )  if  and  only 
if  t[  A  ]  =  id1(st;x).  Hence  b  €  e[  Z  +  X=x[  Z+ X  ]  ][  B  ]  (st)  if  and  only 
if  b  =  id  2  (st ;  x[  t[  A  ]/A  ])  =  id3(st;x). 

0 


With  this  theorem,  it  is  now  easy  to  show  the  soundness  of 
Drvl:  We  again  must  prove  the  extra  clause  (ii)  as  in  Theorem 

4.5  in  order  to  use  this  lash  theorem.. 

Theorem  4.6.  Let  s  6  F.Sch  and  let  e  6  Exp  (s)  .  Then 

(i)  every  statement  in  Drvl(e)  is  valid,  and 

(ii)  if  id:Z— >A  is  in  Drvl(e),  st  6  ESt(s)  and  x  is  a  sequence 
of  universe  elements,  then  a  6  e[  Z=x£  Z  ]  ][  A  ]  (st )  if  and  only  if 
a=id  (st ;  x)  - 

Proof.  We  will  only  provide  an  argument  for  (ii)  for  clause  [2] 
of  Definition  4.12  which  involves  projection.  Suppose  id2:Z—  >A  e 


(Ill, 4)  113 


Drv1(e[X]),  where  id  1 :  X[  Z  ]— >X[  A  ]  is  in  Drvl  (e)  and  id2  is  as 
described  in  clause  2-  first  note  that  id2  (st;  (x)  =  id1(st)(x*), 
where  x *  =  x[  x[  Z  ]/X[  Z  ]  ].  Then  a  H  e[  X][Z=x£Z  ]  ][  A  ]  (st)  if  and  only 
if  there  is  a  t  €  e(st)  such  that  a  =  t£X]£A]  =  t[X[A]]  and 
t[X[Z]]  -  t[X][Z]  =  x[Z]  =  x'£X[Z]],  i.e.,  if  and  only  if  a  € 
e[  X[  Z  ]=x*  [  X[Z  ]  ]][  X[  A  ]  ]  (st)  ,  which  means,  by  induction,  that  a  = 
id1(st)(x*)  =  id2  (st)  (x)  .  Clause  [5],  for  cross  product, 
involves  similar  notational  manipulations, 

Q 

The  main  result  of  the  chapter  can  now  be  stated  as  a 
corollary  to  Theorem  4.6: 

Corollary-  Let  s  €  RSch  and  e  €  Exp(s),  Then  every  element  of 
Irv(e)  is  valid- 

0 


4. 4.  Summary  and  Conclusions 

This  chapter  studied  the  problem  of  determining  the 
consistency  of  relational  structure  mappings.  We  reduced  this  to 
the  problem  of  calculating  valid  constraints  on  relational 
algebra  expressions.  Existing  rules  for  functional  dependencies 
were  extended,  and  this  extension  involved  eguality  constraints 
and  derivation  trees  (or  identifiers)  of  functional  dependencies. 

We  have  provided  one  half  of  the  solution  to  the  state  mapping 
problem.  If  our  rules  were  incorporated  into  a  mapping  processor 
as  in  the  ANSI/SPARC  framework,  we  could  be  100%  confident  in  a 
"yes"  answer  from  the  processor,  where  yes  means  that  the 
structure  mapping  is  consistent.  In  the  next  chapter  we  will  see 
how  confident  we  could  be  in  a  "no"  answer. 


(Ill,  5)  114 


CHAPTER  5 


Completeness  and  Extensions  o_£  the  Rule  s 
for  State  Ma  ppings 


In  the  last  chapter  we  began  studying  the  problem  of 
recognizing  relational  state  mappings.  We  formulated  a  set  of 
rules,  and  we  showed  that  they  were  sound.  In  this  chapter  we 
want  to  do  two  things.  First,  we  want  to  show  that  the  rules  are 
complete.  As  we  have  already  mentioned,  this  is  important 
because  we  would  want  to  be  confident  in  every  response  (in  this 
case  "no,  mapping  not  consist erf* )  of  a  mapping  processor  which 
incorporated  the  rules.  Second,  we  want  to  extend  the  rules  in 
two  important  ways:  We  want  to  add  subset  constraints  to  the 
schemas  to  capture  more  structure  and  make  corresponding 
additions  to  the  derivation  rules.  We  also  want  to  add 
facilities  so  that  multiple  levels  of  mappings  can  be  processed. 

This  chapter  has  five  sections.  In  Section  1  we  first  give  an 

/ 

example  which  shows  that  the  derivation  rules  are  not  complete  as 
they  stand.  This  difficulty  is  traced  to  the  fact  that  the 
problem  we  are  attacking  is  actually  unsolvable.  The  set 
difference  operator  has  made  the  mapping  language  too  powerful, 
and  so  we  continue  the  investigation  by  omitting  this  operator. 
There  is  still  a  problem  when  projections  can  occur  before  cross 
products  in  relational  algebra  expressions,  but  this  problem  can 
be  eliminated.  With  these  modifications,  we  prove  that  the  rules 
are  complete  in  Section  2.  As  in  cases  of  proofs  of  completeness 
in  other  areas.  Section  2  contributes  some  difficult  and 
interesting  theorems. 

Sect? or  3  discusses  subset  con st raints ,  why  they  are  useful 
and  how  the  derivation  rules  can  be  adjusted  to  take  them  into 


account 


(Ill, 5)  115 


Section  4  notes  that  the  derivation  rules  will  not  work  when 
there  ate  two  or  more  levels  of  mapping.  Being  able  to  have 
multiple  levels  of  views  is  a  desirable  capability  of  a  database 
management  system,  and  ve  discuss  how  our  rules  might  be  extended 
tc  allow  this  facility. 

Section  5  summarizes  the  chapter,, 

5.  1.  Incompleteness  of  the  Buies  and  Modifications 

We  know  that  Drvl  does  not  generate  any  invalid  constraints. 
The  next  step  in  a  solution  to  Problem  2  is  to  prove 
completeness:  Does  it  generate  all  of  the  valid  constraints?  As 

we  have  defined  Drvl,  the  answer  is  no,  Drvl  does  not  generate 
all  valid  constraints.  The  following  example  illustrates  this: 

si:  B(2),  S  (2)  ,  R:  1— >2 
e:  (B,+  E)-S 
Dr  v  1  tR  +  S)  =  0 
Drvl (e)  =  0 

Clearly,  the  expression  (R+S)— S  is  equivalent  to  B,  and  therefore 
the  FD  1—^2  is  valid  in  (R+S)—  S.  It  seems,  then,  that  the 
constraint  formula  for  set  difference  must  have  some  knowledge  of 
when  foimula^  are  equivalent  so  that  it  could  recognize,  at  least 
as  far  as  FDs  and  EQs  are  concerned,  (R+S)— S  as  being  equivalent 
to  R.  It  turns  our,  however,  that  functional  dependencies  are 
sufficient  to  capture  the  undecidabie  notion  of  equivalence  of 
two  expressions.  The  following  definition  specifies  what  we  mean 
by  equivalence,  and  the  next  theorem  shows  the  relationship 
between  FPs  and  the  equivalence  property. 

Def initior  5„  1 .  Let  s  6  RSch ,  and  let  el,  e2  6  Exp(s)  of  degree 
n.  Then  el  and  e2  are  equivalent  (with  respect  to  s) ,  written 
e1=e2,  if  for  every  state  st  €  RSt(s),  el  (st)  =e2 (st ) ,.  If  t  is  an 
n-tuple,  then  e*  and  e2  are  equivalent  with  respect  to  t  (with 
respect  to  s) ,  written  e1=e2  wrt  t,  if  for  every  state  st  € 
RSt(s),  t  €  el  (st)  if  and  only  if  t  6  e2(st)  (more  precisely, 
c(t)  €  el  (st)  if  and  only  if  c(t)  6  e2(st),  where  c  is  the 
constant  interpretation  for  st). 


0 


(Ill, 5)  116 


Theorem  5„1-  let  s  6  RScb,  and  let  H  be  an  n-ary  relation  in  s 
such  that  for  every  domain  X  of  R ,  the  FD  R:^->X  is  in  s.  If  el 
and  e2  are  expressions  of  degree  r  which  do  not  contain  R,  then 
e1=e2  if  and  only  if  every  possible  FD  —  > X  is  valid  in  the 
expression  E  =  R+ 'el— e2)  +  (e2-e1) - 

Proof-  If  e1^e2,  then  it  is  immediate  that  for  every  state  st, 

E  (st)  =  R  (st)  ,  and  so  every  FD  0f->X  is  valid  in  E-  Now  suppose 
e1#e2.  This  means  that  there  is  some  state  st  and  some  tuple  t 
such  that  either  t  €  el  (st)  and  t  $  e2(st),  or  t  €  e2(st)  and  t  $ 
e  1  (st ).  In  either  case,  t  €  ((el— e  2)  +  (e2— e  1)  )  (st )  .  Let  t '  *t  be 

any  tuple  not  in  (e1  +  e2)  (st)-  If  (e1+e2)  (st) =U**n,  we  can  extend 
the  universe  U  by  adding  one  more  element  u,  and  we  can  then 
construct  tuple  t*  from  u«  Because  P.  does  not  appear  in  either 
el  or  e2,  we  may  assume  that  R(st)={t'}.  Then  E(st)  contains 
both  t  and  t*  and  therefore  some  FD  0— >X  is  not  true  in  E  (st) . 

D 

This  theorem,  which  shows  that  we  can  use  FDs  to  determine  the 
eguivalence  property,  is  enough  to  show  the  impossibility  of 
calculating  all  valid  FDs  on  arbitrary  relational  algebra 
expressions.  This  is  because  the  problem  of  determining  the 
equivalence  of  two  relational  algebra  expressions  is  undecidable 
as  the  following  theorem  states: 

Theorem  5.2.  Let  s  6  RSch.  The  following  two  problems  are 
undecidable: 

(i)  Given  arbitrary  relational  expressions  el  and  e2  in  Exp(s)  of 
degree  n,  determine  if  e1=e2. 

(ii)  Given  arbitrary  n-tuple  t  and  arbitrary  expressions  el  and 
e2  in  Exp  (s)  of  degree  n,  determine  if  e1=e2  wrt  t. 

Proof.  Let  us  say  that  two  expressions  el  and  e2  are  strongly 
equivalent  (with  respect  to  some  schema  s)  if  el  (str ) =e2  (str)  for 
all  structures  str  €  RStr(s).  In  [Solo]  it  is  proved  that  the 
strong  equivalence  problem  is  undecidable  when  the  expressions  do 
net  contain  selections-  Clearly,  the  problem  is  still 
undecidable  when  selections  can  also  appear.  From  this  we 
conclude  that  equivalence  problem  (i)  is  undecidable  since  an 


(Ill, 5)  117 


algorithm  for  it  would  still  work  when  the  schema  contains  no 
constraints. 

As  an  aside,  we  will  show  that  we  can  also  reduce  equivalence 
problem  (i)  to  the  strong  equivalence  problem: 

First  we  show  how  to  express  constraints  as  relational  algebra 
equalities:  An  FD  R:Z— >A  is  true  in  a  structure  str  if  and  only 

if  (B[  Z=Z  ]R)  [  A=A *  ]  (str)  =  (R[  Z=Z  ]R)  (str)  ,  where  A'=A+deg(R);  a 

DEQ  E:X=Y  is  true  in  str  if  and  only  if  R[X=Y](str)  =  P  (str)  ,  and 
a  VEQ  B:X=V  is  true  in  str  if  and  only  if  B[X=V](str)  =  F  (str). 

Given  a  schema  s,  let  L  be  the  cross  product  of  the  left-hand* 
sides  of  every  equality  as  above  corresponding  to  each  constraint 
in  s,  and  let  R  be  the  cross  product  of  the  corresponding  right- 
hand-sides-  Note  that  L  (str) =R  (str)  if  and  only  if  str  is  a 
state,  and  that  I  (str)  C  B(str)  for  all  structures  str.  Now 
suppose  that  the  relations  in  s  are  R1,„..,Bn.  define  the 
expression  W  =  Bl[  1  ]+.  ...  +  Bn[  1  ]-  Then  W  has  degree  1,  and 
W(str)=£f  if  and  only  if  R(str)=0  for  each  B  €  s.  Now  define  E  = 
W- (WQ  (R-L)  )  [  1  ].  This  expression  has  the  property  that  E  (str)  =0 
if  and  only  if  str  is  not  a  state  or  str  is  empty.  We  now  have 
the  reduction:  e1=e2  if  and  only  if  elSE  is  strongly  equivalent 
to  e28E. 

For  part  (ii)  of  the  theorem  we  prove  that  e1=e2  if  and  only 
if  e1fe2  wrt  (d+ 1  ,. ..  ,d«-n)  ,  where  d  is  the  largest  constant 
appearing  in  el,  e2  or  s.  The  ‘only  if*  part  is  clear.  Now 
suppose  e1^e2  wrt  (d+ 1 , . - . ,d  +  n) ,  and  let  st  be  any  state.  Recall 
(Definition  4-2)  that  st  consists  of  a  universe  U,  a  sequence 
(c  (i)  :  i=0,  1, 2, .  .»)  of  constant  symbol  interpretations,  and  a 

seguence  (P.(st)  :  B  6  s)  of  relation  interpretations.  Let  t  be 
any  n-tuple  over  U-  L et  (c*(i)  :  i=0,1,...)  be  the  constant 

interpretation  obtained  from  c  by  setting  c*(d+i)=t[i]  for  1<i<n 
and  c*(j)=c(j)  for  j<d+1  and  j>d+n.  Then  the  structure  st' 
consisting  of  U,  c*  and  (BJst)  :  F.  €  s)  is  also  a  scare  since  the 
altered  components  of  the  constant  symbol  interpretation  do  not 
appear  in  s«.  Thus  t=c  (d+  1 , - .  .  , d+  n)  €  el  (st1)  if  and  only  if 
t=c  (d  ♦  1 ,....,  d+n)  €  e2(st*).  Since  the  altered  components  of  the 

I 

constant  symbol  interpretation  also  do  not  appear  in  el  or  e2 , 
el  (st * ) =e 1 (st)  and  e2  (st * ) =e2  (st)  ,  and  we  get  t  6  el  (st)  if  and 


(Ill,  5)  118 


only  if  t  €  e2  (st) .  Since  t  was  arbitrary,  e  1  (st )  =e2  (st)  . 
Therefore  e1=e2. 

From  this  we  see  that  equivalerce  problem  (ii)  is  undecidable, 
since  an  algorithm  for  (ii)  would  provide  an  algorithm  for  (i). 

0 

The  following  note  relates  this  result  to  the  "real  world"  of 
database  management:  Given  some  fixed  computer/database  system, 
there  are  actually  only  a  finite  number  of  states  for  any  schema. 
This  is  because  there  are,  for  example,  only  a  finite  number  of 
integers  representable  in  32  bits,  and  therefore  only  a  finite 
number  of  tuples  of  any  degree,  and  only  a  finite  number  of  sets 
of  tuples.  This  means  that  a  decision  procedure  for  the 
equivalence  of  expressions  would  be  to  check  for  equality  in  each 
of  the  finite  number  of  possible  states.  However,  we  may  pretend 
that  more  memory  can  be  added  to  the  computer  indefinitely.  This 
will  allow  representation  of  arbitrarily  large  integers  and  the 
number  of  representable  states  will  be  unbounded.  It  is  in  this 
situation  that  equivalence  of  expressions  is  undecidable. 
(Pretending  to  be  able  to  add  memory  ad  infinitum  is  a  conceptual 
device  also  used  by  computability  theorists.) 

With  this  undecidability  result,  w<=  can  prove  the 
undecidability  of  calculating  constraints  on  expressions: 

Theorem  5-3.  The  problem  of  determining  ail  valid  constraints  in 
a  given  relational  expression  of  syntax  I  over  a  given  schema  ir 
BSch  is  undecidable. 

Proof.  If  there  were  a  procedure  for  calculating  all  constraints 
valid  in  a  relatioral  algebra  expression,  then  by  Theorem  5.1 
there  would  be  a  procedure  for  determining  the  equivalence  of 
relational  algebra  expressions.  But  this  latter  problem  is 
undecidable  by  Theorem  5.2. 

D 

Notes:  (1)  According  to  the  definition  of  structure  (Definition 

4.2),  constant  symbols  may  be  mapped  arbitrarily  to  elements  of 
the  universe.  The  proof  of  Theorem  5.2  made  essential  use  of 
this  freedom.  It  is  possible,  however,  to  require  that  the 
constant  symbol  interpretation  in  a  structure  map  different 


(Ill  ,5 )  119 


constant  symbols  to  different  universe  elements,  which  seems  more 
natural,  and  still  ♦ o  be  able  to  prove  the  theorem. 

(2)  Theorem  4.7  used  FDs  with  null  left-han d-sides.  If  the 
selection  operator  were  omitted,  we  could  restrict  FDs  to  those 
with  non-null  left-ban a- sides ,  and  this  case  Theorem  5.1  could 
not  be  used  (even  though  the  equivalence  problem  is  still 
unsoivable).  It  is  not  hard,  however,  to  modify  this  theorem  to 
use  only  FDs  with  non-null  left-hand- sides. 

(3)  Theorem  5.1  also  used  the  artifice  of  introducing  a  new 
relation  symbol.  ’This  theorem  can  also  be  proved  (with  more 
difficulty)  using  only  the  original  schema. 

Given  this  undecidability  result,  we  next  ask  what  changes  can 
be  made  to  the  mapping  language  which  will  allow  a  decidable 
complete  set  of  derivation  rules. 

There  are  two  easy-to- specif y  restrictions  which  can  be  placed 
on  the  relational  algebra  syntax  which  will  eliminate  the  above 
incompleteness.  One  restriction  is  to  allow  relation  names  to 
appear  at  most  once  in  an  expression.  Then  expressions  such  as 
(R+S ) — S  will  not  be  allowed,  and  the  constraints  valid  on  a  set 
difference  will  always  be  exactly  the  ones  valid  on  the  first 
component  of  the  difference. 

Another  restriction  is  to  disallow  set  difference  as  a 
relational  algebra  operator..  Without  set  difference,  we  cannot 
perform  the  reduction  of  Theorem  5-  1,  and  the  complete 
calculation  of  constraints  on  relational  algebra  expressions  not 
using  set  difference  should  be  possible. 

Let  us  consider,  then,  the  language  of  syntax  (II)  of 
Definition  4.5  in  which  set  difference  has  been  omitted..  Does 
completeness  hold  now?  With  this  language,  the  answer  is  still 
no,  not  all  FDs  and  EQs  are  detected. 

Before  we  give  a  counterexample  (pointed  out  by  E.F..  Codd)  ,  we 
will  briefly  discuss  certain  behavior  of  joins  of  projections. 

For  this  discussion  only,  we  will  use  different  notation  which 
will  allow  examples  of  such  joins  to  be  written  more  succinctly. 
The  example  relation  R  will  have  domains  named  A,  B  and  C„  A 
join,  denoted  by  will  be  understood  to  be  an  egui-join  on  the 


(Ill,  5)  120 


like- rawed  domains  of  its  components,  and  it  will  eliminate  one 
of  the  joined  domains. 

Ordinarily,  a  join  of  projections  such  as  R[AC]*R[BC]  will 
contain  all  the  tuples  in  R,  plus  extra  tuples,  i.e., 

E  C  R[AC]*R[BC],  but  R  *  R[  AC  ]*R[  BC  ]»  Take  the  following 
example : 


R 

Rf  AC  ] 

R[  BC  ] 

R[ AC]*R[ BC] 

A 

B 

C 

A  C 

5  C 

ABC 

0 

1 

2 

0  2 

1  2 

0  1  2 

3 

4 

2 

3  2 

4  2 

0  4  2 

3  1  2 

3  4  2 

If  enough  structure  is  added  to  R,  namely  in  the  form  of  FDs  r 
then  no  extra  tuples  appear  in  the  join;  that  is,  the  join  will 
be  "ncn-loss" .  In  our  example,  the  FD  C— >B  is  sufficient  to  make 
E  have  a  non -loss  join.  (The  above  example  violates  the  FD 
C— >B.  )  General  necessary  and  sufficient  conditions  for  a  relation 
tc  have  a  non-loss  join  are  given  in  [AhBU]. 

Now  let  as  return  to  constructing  a  counterexample  to  the 
completeness  problem.  We  use  the  relation  E  as  above  (returning 
to  the  usual  notation)  and  add  a  second  FD  1,2— >3  to  the  FD  3— >2 
already  mentioned.  We  have: 

s:  R  (3)  ,  R:  1,  2— >3,  R:3->2 

Dr  v  1  (R[  1  ,3  ])  =  0 

Dr v  1  (E[  2 ,3  ])  =  {2— >1  ,  1->1  ,  2->2] 

Drv1(R[  I,  J  ][  2=2  ]R[  2,3  ])  =  {4->3,  2->3,  1->1  ,  ...} 

Drv1((R[1,3][2  =  2]R[2,3])[  1,3,4])  =  (3->2,  1->1  , 

(The  ellipsis  represents  the  trivial  FDs  arising  from  the 
reflexive  rules).  Yet,  it  can  be  shown  that  for  any  state  st , 
that 

E(st)  --  (R[  1  ,3  ]{  2=3  ]E[  2,  3  ])  £  1,3,  4](st)  , 

i.e.,  that  R  has  a  non-loss  join.  Therefore,  the  FD  1 ,2->3  is 
valid  in  the  expression,  but  it  is  not  calculated  by  urvl- 

Intuitively,  it  is  not  hard  to  see  why  this  happened-  The  FD 
1f2— >3  was  lost  through  a  projection,  but  the  later  join  was  non¬ 
loss  and  so  the  FD  reappeared.  It  should  also  not  be  hard  to 


(XII, 5)  121 


believe  that  this  problem  car  be  eliminated  by  restricting  the 
mapping  language.  The  problem  arose  by  doing  projections  and 
then  undoing  them  by  joins.  A  restricted  mapping  language  which 
does  all  cross  products  (joins)  before  any  projections  may  be 
just  as  powerful  as  the  unrestricted  language  and  should  not  lose 
any  FDc  as  happened  above.  This  is  what  the  third  version  of  the 
relational  algebra  syntax  does.  It  first  allows  cross  products 
on  base  relations  or  selections.  Then  restrictions  are  allowed, 
and  lastly,  union  and  projections  are  allowed.  (The  reason 
selections  are  moved  inward  to  base  relations  will  be  given 
later. ) 

In  order  to  show  that  syntax  III  of  Definition  4„  5  is  just  as 
powerful  as  syntax  II,  we  show  that  every  sy nta x-II-e xpre ssion 
can  be  transformed  to  an  equivalent  s yntax-III-ex pression .  The 
transformations  move  cross  products  inward  until  their  operands 
are  either  base  relations,  selections  or  other  cross  products. 
They  also  move  selections  in  to  base  relations.  The  following 
theorem  accomplishes  this: 

Theorem  5o4.  Let  s  6  ESch,  and  let  el,  e2  and  e3  be  in  Exp(s). 
Then  the  following  equivalences  are  valid: 

[la]  e1[  X  ]fle2  =  (e10e2)  [  X,D2  ], 

where  D2  is  deg  (el)  +1  ,deg  (e  1)  42,.  -  .  ,deg  (el)  ♦deg  (e2) 

[lb]  el®  (e2[  X  ])  =  <e  1fie2 )  [  D1  ,X»  ],  where  X'=X+deg(e1)  and 
D 1  is  1 , 2, .  »  -  ,  deg  (e  1) 

[2a]  e1[  X=Y ]Se2  =  (e1fie2)[X=Y] 

[2b]  el®  (e2[  X=Y  ])  =  (e  1®e2 )  [  X'  =Y »  ] , 

where  X*  =  X+deg(e1)  and  Y*=Y+deg(e1) 

[3a]  (e1+e2)fie3  =  (e  1fie3  )  ♦  (e 2fie3 ) 

[3b]  e1®(e2*e3)  =  (e  1®e2  )  +  (e  18e3 ) 

[4]  e[X][YfV]  =  e[  Y*  =V  ][  X  ]  ,  where  X[Y]=Y* 

[5]  e[  X=Y  ][  Z=V  ]  =  e[Z=V][X=Y] 

[6]  (el  +e2)  [  X=V  ]  =  e  1  [  X=V  ]+e2[  X=V  ] 

[7a]  (e1®e2)  [  X=V  ]  =  e1[XfV]fie2,  where  X<deg(e1) 

[7b]  (e  1  Qe2)  [  X-V  ]  =  e  1 8  (e2[  X  *  =  V  ] )  ,  where  X>deg(e1)  and 

X'=X-deg(e1) 

[8]  e[X][X1=Y1]  =  e[  XI '  =Y1  •  ][  X  ]  ,  where  X 1  *=X[  XI  ]  and  Y1  *=X[  Y1  ] 

[9]  (e  1  +e2)  [  X=Y  ]  =  el  [  X=Y  ]+e2[  X=Y  ] 


(Ill, 5)  122 


Proof.  We  prove  only  part  (a)  of  two-part  equivalences.  The 
proof  for  parts  (b)  are  analogous. 

For  ['*a],  t  €  e  1[  X  ]8e  2  (st)  if  and  only  if  pl(t)  €  e1[X](st) 
and  p2(t)  €  e2(st).  This  is  true  if  and  o'uly  if  for  some  tl, 
t1[X]=p1(t),  tl  €  el(st)  and  p2  (t)  €  e2  (st)  ,  which  is  true  if  and 
only  if  t1«p2(t)  €  e18e2(st)  and  (t  18p2  (t)  )  [  X ,D2  ]  =  p1(t)8p2(t)  = 
t,  i.e.,  if  and  only  if  t  €  (e  1®e  2)  [  X  ,D2  ]. 

For  [2a],  t  €  e1[X  =  Y  ]<5te2  if  and  only  if  pi  (t)  [  X]  =  pi  ( tj  [  Y  ] , 
pl(t)  €  el(st)  and  p2(t)  €  e2  (st)  ,  which  is  true  if  and  only  if  t 
€  e10e2(st)  and  t[X]  =  t[  Y  ] ,  i.e.  ,  if  and  only  if  t  € 

(e  18e  2)  [  X=Y  ]  (st)  . 

For  [3a],  t  €  (e1+e2)  8e3  (st)  if  and  only  if  (pi  (t)  h  el(st)  or 
pi  (t)  €  e2(st))  and  p2(t)  €  e3(st).  This  is  true  if  and  only  if- 
( p  1  ( t )  €  el  (st)  and  p2  (t)  €  e3(st))  or  (p  1  (t)  e  e2(st)  and  p2  (t) 

€  e3(st))  ,  i.e.,  if  and  only  if  t  6  el8e3(st)  or  t  e  e28e3(st). 

For  [4],  t  €  e[  X  ][  Y=  V  ]  (st )  if  and  only  if  t[Y]=V  and  for  some 
t*  €  e(st),  t*[y]=t.  This  is  true  if  and  only  if  for  some  t'  € 
e  (st)  ,  t'[X]=t  and  t'[X[Y]]=V,  i.e.,  if  and  only  if  t  € 
e[Y»=V][X](st). 

Clause  [5]  is  clearly  true. 

For  [6],  t  €  (eHe2)  [X=V](st)  if  and  only  if  t  €  (e1+e2)  (st) 
and  t[X]5n..  This  is  true  if  and  only  if  (t  €  el  (st)  and  t[X]=V) 

cr  (t  6  e2(st)  and  t[X]=V),  i.e.,  if  and  only  if  t  €  e1[Xf;V](st) 

♦  e2[ X=V  ]  (st) . 

For  [7a],  t  €  (e18e2)  [  X=V  ]  (st)  if  and  only  if  t[X]=V,  pl(t)  6 

el(st)  and  p2(t)  €  e2  (st)  -  This  is  true  if  and  only  if 

p1(t)[X]=V,  pi  (t)  €  el(st)  and  p2  (t)  €  e2  (st)  ,  i.e.,  if  and  only 
if  t  6  (elfX^V  ]8e2)  (st)  „ 

For  [8],  t  €  e[X][  X1  =  Y1]  (st)  if  and  only  if  t[X11=t[Y1]  and 
for  some  t*  €  e(st),  t'[X]=t-  This  is  true  if  and  only  if  for 
some  t*  6  e(st),  t  *  [  X  1  *  ]=t  *[  Y  1*  ]  and  t*[X]=t,  i..e.,  if  and  only 
if  t  €  e[  X  1  *  =Y  1 '  ][  X  ]  ( st)  . 

For  [9],  t  €  (e1*e2)  [  X=Y  ]  (st)  if  and  only  if  t[X]=t[Y]  and  (t 
€  el(st)  or  t  €  e2(st))..  This  is  true  if  and  only  if  (t  e  el  (st) 
and  t[X]=t.[Y])  or  (t  €  e2(st)  and  t[X]=t[Y]),  i.e.,  if  and  only 
if  t  e  (ei[X  =  Y  ]+e2[X=Y  ])  (st)  . 

Note  that  these  equivalences  actually  make  no  use  of  the 
ccnst rain-^s  in  s. 


(Ill, 5)  123 


0 

The  eg.iivalei.ces  of  Theorem  5.4  can  be  viewed  as 
transformation  rules:  Given  an  expression  e  according  to  syntax 
II,  match  some  part  of  e  with  the  lef t-ha nd-side  of  an 
eguivalence  of  Theorem  5.4  to  produce  a  new  expression  e'  from 
the  right-hand-side  of  the  eguivalence.  Repeat  the  process  with 
e'.  Continue  as  long  as  possible.  The  result  will  be  an 
expression  according  to  syntax  III  which  is  eguivalent  to  e.  We 
record  this  result  in  the  following  theorem. 

Theorem  5.5.  Every  expression  following  syntax  II  is  eguivalent 
to  an  expression  following  syntax  III. 

□ 


By  this  theorem,  we  are  justified  in  assuming  that  all 
expressions  conform  to  syntax  III. 

Before  we  continue  with  the  completeness  problem,  let  us 
review  the  example  which  was  used  to  show  that  Drvl  was  not 
complete  for  syntax-II-expressions.  Recall  that  we  had  a 
relation  R  (3)  with  FDs  1,2->3  and  3->2.  For  the  expression 

(R[1,3][  2=2]R[2,3])[  1,3,4], 


we  calculated 

Drvl  (  (R[  1  ,3  ][  2=2  ]B[  2,  3  ])  [  1 ,  3,  4  ])  = 

[  3— >  2  ,  no.}  «i 

In  actual  fact,  the  FD  1,2— >3  is  also  valid  but  is  not  calculated 
by  Drvl.  Let  us  see  what  happens  by  applying  the  above 
eguivalence  transformations.  One  of  the  two  possible 
transformations  is 

( R C  1  ,3  ][  2=2]R[  2,3  ])[  1,3,4]  — >  (by  [  la]) 

(R[  3=  2]R[  2,  3])  [  1, 3,  4,  5][  1  ,3 ,4  ]  — >  (by  [lb]) 

(R[3=3]E)[  1,2, 3,5,6  ]£  1,3, 4,5  ][  1,  3,4]. 

We  calculate  the  constraints  valid  in  this  last  expression  as 
fellows: 


Dr v  1  (R )  -  {1  ,2— >3  3— >2 


(Ill,  5)  124 


Dr v  1  (E [  3=3  ]E)  =  f  1 , 2— >  3  3->2  3=6  4,  5->6 

6— >5 

1 , 2— >6 

6— >2  4, 5— >2  4 , 5— >3  3->5 

2=5 

1 , 5— >3 

2 , 4— >6  1 , 5— >6  2, 4— >3  - 

} 

(3->2  and  3— >5  are  the  sane 

FD) 

Drvl  ( (E[3=3  ]E)  [  1 ,2  ,3,5  ,6  ])  =  (  1,2->3  3->2 

3  =  5 

5— >4  1,  2— >5 

5— >2  3— >4 

2=4 

1 , 4->  3 

1 ,  4— >5  . } 

Drv  1  (  (E[  3=3  ]E)  [  1 , 2  ,3 ,5  ,6  ][  1,3, 4,5])  =  {2=4 

4— >3 

2— >3  1,  3— >  2 

1,3- 

•>4  . 

Dr  v  1  ((E[3  =  3]E)[1,2,3,5,6][  1,3, 4, 5  ][  1,3,4  ]  = 

{3->2  1 , 2— >  3 

These  are  exactly  the  constraints  in  E  (to  which  the  expression  - 
is  equivalent).  The  counterexample  is  no  longer  a  problem,  and 
we  shall  soon  see  that  completeness  holds  for  the  restricted 
syntax. 

It  should  be  noted  that  with  the  revised  mapping  language  it 
is  still  possible  for  FDs  to  "fall  in  the  cracks  between 
relations."  The  meaning  of  this  expression  can  be  illustrated  by 
a  modification  of  the  example  already  used: 

si:  E  (3) ,  El , 2— >3,  E : 3— >2 
si  :  S  ( 2)  ,  T  (2)  ,  T:  2->  1 
m:  S  =  E[  1  ,3  ],  T  =  E[ 2,3  ] 

If  in  s2  we  define  a  relation  U  =  (S[  2=2  JT)  [  1 , 3, 4  ],  we  will  have 
the  same  situation  as  before.  Namely,  U:1,2— >3  and  U:3— >2  are 
valid  FDs,  but  the  first  cannon  be  calculated  from  the  FDs  in  the 
schema  s2.  In  other  words,  there  is  an  inter-relation* 1  FD  in 
this  schema,  but  there  is  no  formalism  to  express  it. 

For  questions  involving  the  consistency  of  structure  mappings, 
this  hidden  FD  does  not  matter;  all  that  is  important  is  that  the 
FD  T:2— >1  in  s2  is  derivable  from  the  FDs  in  si. 

We  should  note,  however,  that  we  cannot  extend  the  present 
derivation  rules  to  cases  involving  more  than  one  level  of 
mapping..  The  following  example  is  a  continuation  of  the  one 
above : 


si:  E  (3) ,  E : 1 ,2— >3 ,  E:3->2 


(Ill, 5)  125 


s2  :  S  ( 2)  ,  T  (2)  ,  T:  2->  1 
s3:  U(3),  D : 1 , 2— >3 ,  U:3->2 
ml:  S=R£  1,3],  T=R£2,3] 
m2  :  U=  (S[  2=  2  ]T)  [  1,3,4] 

The  derivation  rules  only  look  at  constraints  at  the  immediately 
precedinq  levels  The  FD  U:1,2—  >3  in  s3,  as  we  have  seen,  is 
valid,  but  we  also  see  that  it  cannot  be  derived  from  constraints 
in  schema  s2.  Section  4  will  discuss  this  problem  further. 

5.2.  Completeness  of  the  Rules 

In  the  above  discussions,  we  first  removed  set  difference  from 
the  mapping  language,  and  then  we  restricted  the  syntax  so  that, 
cross  products  are  performed  before  projections.  With  these 
restrictions  we  can  finally  prove  completeness  of  the  derivation 
rules.  We  begin  with  a  theorem  which  proves  completeness  for 
expressions  without  the  restriction  operator  (syntax  IV),  and 
then  we  prove  it  for  any  expression  of  syntax  III. 

Theorem  5.6-  Let  s  6  PSch;  let  e  €  Exp  (s)  according  to  syntax  IV 
be  consistent.  Then  any  valid  EQ  on  e  is  a  member  of  Drvl(e)  , 
and  if  Z— >A  is  valid  on  e ,  then  for  some  Z1  C  Z,  Z1— >A  €  Drvl  (e) . 
(The  condition  on  FDs  compensates  for  the  absence  of  the 
augmentation  rule.* 

Proof.  The  proof  of  this  theorem  may  be  found  in  the  Appendix. 

D 

Now  we  want  to  use  this  theorem  to  prove  completeness  for 
expressions  of  syntax  III,  which  contain  restriction  operators. 
The  restriction  operator  was  omitted  in  proving  Theorem  5.6 
because  we  could  not  construct  the  needed  counterexample  states. 
We  define  a  transformation  from  expressions  of  syntax  III  to 
those  of  syntax  IV.  This  transformation  will  not  be  an 
equivalence,  but  it  will  have  two  properties  which  will  be 
sufficient  for  proving  completeness:  Tuples  apearing  in  the 
transformed  expression  will  appear  in  the  original,  and  the 
ccnstraints  derived  on  the  transformed  expression  will  be  the 
same  as  those  derived  on  the  original. 


(Ill,  5)  126 


Definition  5.2.  let  s  €  FSch  and  let  e  be  a  consistent 
expression  over  s  according  to  syntax  III.  We  may  write  e  as 
r1[ XI  ]+...+  rn[ Xn ] ,  where  each  ri  is  a  restriction  of  a  cross 
product  (see  Definition  4.5).  The  expression  G (e)  over  s 
conforming  to  syntax  IV  is  defired  as  follows: 

Let  r  be  ore  of  the  terms  r1,...,rn,  and  let  X=Y  be  one  of  the 
restrictions  in  r„  Let  r*  be  r  with  the  restriction  X=Y  removed. 
There  are  three  cases:  X=V  €  Drv1(r*)  for  some  Vr  Y=V  6  Drvl  (r*) 
for  seme  V,  and  no  VEQ  on  X  or  Y  is  in  Drvl  (r*) »  In  the  first 
case  replace  r  by  r*[ Y= V  ] ;  in  the  second  case  replace  r  by 
r*[X=V],  and  in  the  third  case  replace  r  by  r *£ X=  V 1  ][  Y=V  1  ]  + 
r  *[  X=  V2  ][  Y=V2  ],  where  V1*V2,  and  neither  VI  nor  V2  appear  in  e  or 
in  s.  Now  rearrange  the  resulting  expression  so  that  it  is  again 
of  the  form  r  1[  X 1  ]+....  +  rn[  Xn  ] ,  where  each  ri  is  a  restriction  of 
a  cross  product.  The  new  expression  will  have  one  less 
restriction  than  the  original.  F.epeat  this  process  until  there 
are  no  more  restrictions.  The  result  is  G(e). 

0 

The  theorem  demonstrating  the  properties  of  G  needs  the 
following  theorem  which  states  that  certain  equivalent 
expressions  have  the  same  set  of  derivable  constraints. 

Theorem  5.7.  Let  s  e  FSch,  and  let  e,e1,e2  fcr  Exp (s) .  Then 

(i)  Drvl  (e[X=Y][ZfV  ])  =  Drv  1  (e£ZfV  ][X=Y  ])/ 

(ii)  Drvl  ( (e1Se2)  [  X=V  ])  =  Drvl  (e  1[ X=  V  ]Se2)  ,  if  X<deg  (B.) 

(iii)  Drvl  ( (e1®e2 )  [  Xf  V  ])  =  Dr  vl  (e  10  (e  2[X=V  ]) )  ,  if  X>deg(E) 

(iv)  Drv  1  ((e1*e2)[X])  =  Drvl  (e  U  X  ]+e2£X  ]) 

Proof.  The  proof  may  be  found  in  the  appendix. 

0 

Theorem  5-,  8.  Let  s  6  FSch,  and  let  e  €  Exp(s)  be  consistent  and. 
of  syntax  III.  Then  Drv  1  (G  (o) )  =Drv  1  (e)  ,  and  for  every  structure 
str,  G  (e)  (str)  C  e  (str)  . 

Proof.  The  proof  may  be  found  in  the  Appendix. 

0 


We  are  nov;  ready  to  prove  the  main  completeness  theorem. 


(Ill, 5)  127 


Theorem  5-9.  Let  s  €  RSch,  and  let  e  €  Exp(s)  be  consistent  and 
of  syntax  III*  Then  any  valid  EQ  on  e  is  a  member  of  Drvl(e)  , 
and  if  Z->A  is  valid  on  e,  then  for  some  Z1  C  Z,  Z1->A  €  Drvl  (e) . 

Proof.  Suppose  e  is  consistent  and  c  $  Drvl(e).  Since  e  is 

consistent,  G  (e)  is  consistent.  If  c  is  an  EQ  not  in 

Drvl  (e)  =Drv1  (G  (e)  )  ,  or  if  it  is  ar.  FD  Z->A  such  that  Z1->A  $ 

Drvl  (e)  =Drv1  (G  (e)  )  for  any  Z1  C  Z,  then  by  Theorem  5.6,  there  is 

a  state  st  such  that  c  is  false  in  G(e)  (st).  Since  G  (e)  (st)  C 
e(st),  c  is  also  falsa  in  e  (st)  . 

D 

Corollary..  Let  s  6  P. Sch  and  e  €  Exp(s)  oi  syntax  III  be 

consistent.  Then  a  constraint  c  on  e  is  valid  if  and  only  if  c  e 

Drv  (e) . 

Proof.  This  is  immediate  from  Theorems  4.2  and  5.9. 

D 

5.3.  Sub  set  Constraints. 

In  this  section  we  will  discuss  a  new  kind  of  statement  which 
we  call  a  subset  constraint.  Vie  will  give  some  motivation  for 
studying  this  kind  of  constraint,  and  then  we  will  discuss  how 
inference  rules  for  subset  constraints  can  be  integrated  with  the 
rules  already  presented  for  functional  dependencies  and 
egualities.  In  Section  5.4,  subset  constraints  will  provide  a 
convenient  tool  for  further  discussion  of  the  completeness 
problem. 

Subset  constraints  are  very  important  within  the  relational 
model.  For  example,  consider  the  following  relations: 

Employee (emp#, name , address ,div#, de  pt#) 

Division (div#, name , head_of  fice) 

Department  fdiv#f dept#, name ,loc) 

In  the  Employee  relation,  the  attributes  div#  and  dept#  identify 
cr  "point  to"  the  Department  to  which  an  Employee  is  attached. 
Employees  must  be  attached  to  "real,  existing"  Departments..  That 
is,  given  any  Employee  tuple  (e # , n , a, dv #, dp #) ,  there  must  be  some 


(111,5)  129 


Department  tuple  with  dv#,dp#  as  its  key  values-  Similarly, 
Departments  are  identified  by  a  department  number  within  a 
division  which  is  identified  by  a  division  number.  Departments 
must  exist  within  known  divisions;  i. e- ,  for  every  Department 
tuple  (dv#,dp#,n,  1)  there  must  be  a  Division  tuple  with  dv#  as 
its  key  value. 

The  requirements  that  every  pair  of  div#, dept#  values 
appearing  in  the  Employee  relation  must  also  appear  in  the 
Department  relation  can  be  expressed  by  a  set- theoretic  inclusion 
cn  projections: 

Employeef div# , dept # ]  C  C epar tmen t[ di v# ,dept#  ]. 

The  constraint  on  division  numbers  is 

Depart  ment[  div#  ]  C  Division^  div#  ]. 

The  attributes  uiv#, dept#  in  Employee  and  div#  in  Department  are 
often  called  foreign  keys  since  there  are  correponding  attributes 
in  other  relations  which  form  keys.  The  "foreign  attributes" 
need  not,  however-  be  keys  ?s  the  next  example  shows: 

Plant  (name,  city,  prod  uct) 

Job- Applicant  (name  ,  address  , preferred-city) 

A  Job-Applicant  always  specifies  which  city  he  would  prefer  to 
work  in.  The  preferred  city  must,  of  course,  be  one  in  which 
there  are  plants  located.  Hence  there  is  the  constraint 

Job-Applicant[  pref erred_city ]  C  Plan t[ city] 

However,  there  may  be  several  plants  in  one  city,  so  city  is  not 
a  key  of  Plant,  and  ' pref erred- city  *  cannot  strictly  be  called  a 
foreign  key. 

Subset  constraints  are  also  an  essential  component  of 
hierarchical  data  models.  In  this  context  they  take  the  form: 
"Every  child  must  have  a  parent."  Take  the  following  hierarchy 
as  an  example: 


{I  II  r  5  )  129 


Department (d£,name , chair man) 

Course  (£#, title, credits) 

Lecture  {I.#/ room ,  time) . 

Every  lecture  occurrence  must  be  associated  with  a  course 
occurrence,  and  every  course  occurrence  must  have  a  department 
parent.  By  the  process  of  normalization  (to  First  Normal 
Form)  [  Codd72fc  ],  we  can  express  this  hierarchy  by  a  set  of 
relations  with  key  and  subset  constraints: 

Department (d#, name , chairman) 

Course  (d#, c#, title, credits) 

Lecture (d# ,c#, l#,rooi. time ) 

Lectur  e[  d#  ,c#  ]  C  Course[c#,cl] 

Course[d#]  C  Depar  tmen t[  d #  ]. 

Network  data  models  also  have  features  corresponding  to  subset 
constraints.  For  example,  suppose  we  had  the  following  Codasyl 
schema: 


Driver (dr-lic-no ,name, age) 

Own 

V 

Car  (reg-no,make,modsl) 

If  the  Own  set  type  is  mandator*,  we  could  represent  this  set 
type  and  the  two  record  types  by  two  relations  and  a  subset 
constraint: 

Triver (dr-lic-no, name. age) 

Car  (reg-no, make, model, owner) 

Car[ owner]  C  Drive r[ dr-lic-no]„ 

Having  established  that  subset  constraints  are  important 
concepts  to  investigate,  we  now  consider  how  they  may  be 
incorporated  into  the  system  cf  constraints  already  constructed. 

We  will  define  a  new  data  model,  called 
EDm*=  (RSch* ,FStr • ,BSt * ,RQ *)  ,  which  will  be  like  RDm  with  the 
exception  of  having  subset  constraints  in  tne  schemas  in  addition 
to  FDs  and  EQs. 


(Ill, 5)  130 


Definition  5.3.  The  set  RSch*  of  schemas  consists  of  relation 
declarations,  functional  dependencies  and  egualities  as  in 
Definition  4.1,  In  addition,  it  contains  subset  constraints 
(SSs)  of  the  form 

name1[X]  C  name2[Y]. 

A  general  subset  constraint  has  the  form  e1[X]  C  e2[Y],  where  el 
and  e2  are  in  Exp(s). 

For  each  s  6  RSch',  the  set  RStr'(s)  of  structures  is  defined 
by  RStr*  (s)=RStr(s*)  ,  where  s *  €  RSch  is  obtained  from  s  by 
deleting  all  SSs:  The  structures  for  RDm'  are  the  same  as  those 
for  RDm. 

Given  s  6  RSch*  and  str  €  RStr  •  (s)  ,  an  SS  elf X  3  C  e2£  Y  ]  is 
satis  f  ied  or  true  in  str  if  e1(str)£X]  C  e2(str)[Y].  (The  SS 
ne1[X]  C  e2[Y]"  is  a  formal  string  of  symbols,  the  condition 
el  (str)  [  X]  C  e2(str)[Y]  is  a  set-theoretic  inclusion  which 
happens  to  have  a  similar  appearance.) 

Given  s  €  RSch*  and  st  €  RStr*  (s) ,  st  is  a  state,  i.e. ,  a 
member  of  RSt'  (s)  if  every  FD ,  EQ  and  SS  in  s  is  true  in  st.  (We 

dc  not  deal  with  the  operations,  RQ * (s) ,  in  this  chapter.) 

The  state  component  Rf*s*  of  the  ma££inc[  model  from  RDm'  to 
RDm*  may  be  defined  exactly  as  in  Definition  4.7. 

0 

The  problem  again  is  to  determine  when  a  structure  mapping  is 
consistent: 

Frobl em  5.  Given  si,  s2  €  RSch'  and  m  h  RMs'(s1,s2),  determine 
whether  or  not  m  is  consistent. 

As  befoce,  we  reduce  this  problem  to  one  of  determining  validity 
cf  constraints: 

Theorem  5.10.  Let  si ,s2  6  RSch'  and  m  6  RMs'(s1,s2).  Then  m  is 
consistent  if  and  only  if  for  each  constraint  c  in  s2,  the 
translation  of  c  into  si  by  m  is  valid  in  si. 


D 


(Ill, 5)  131 


Problem  6-  Given  s  8  RSch*  and  el,  e2  €  Exp(s)  ,  determine 
whether  or  not  e1:Z— >A,  e1:X=Y,  e1:X=V  and  e1[X]  C  e2[Y]  are 
valid. 

Now  we  will  discuss  some  rules  for  deriving  new  constraints 
from  old  ones  when  SSs  are  involved. 

The  set-theoretic  inclusion  relation  ' C*  is  transitive  and 
reflexive.  Since  SSs  are  valid  when  the  corresponding  set- 
inclusions  hold  in  the  structure,  we  can  postulate  the  rules 

B[  X  ]  C  B£  X  ],  and 

if  R£X]  C  S£Y]  and  S[  Y  ]  C  T[  Z  ], 
then  R£X]  C  T[  Z]. 

Another  property  is  that  the  domains  mentioned  in  an  SS  can  be 
made  smaller.  In  the  first  example  of  this  section,  from  the  SS 

Employee[ div# , dept# ]  C  Department[ div # ,dept#  }, 

we  can  infer  that 

Employee[ di v# ]  C  Department[ div# ]  and 
Employee[ dept#  ]  C  Department [ dept#  ] 

are  valid.  How  to  express  this  in  the  formal  notation  becomes 
clear  if  these  two  derived  SSs  are  written: 

Employee[  div#, dept#  ][  di  v#  ]  C  Department[ div#, dept# ][ div #  ]  and 
Employee[ div#, dept#  ][  dept# ]  C  £ep drtment[  div#  ,  dept#  ][  de  pt#  ]. 

In  other  words,  we  can  pick  out  which  domain  to  keep  in  the 
"smaller"  SS  by  using  a  projection  operation.  The  following  rule 
should  then  be  valid: 

if  R[X]  C  S£Y],  then  R£X[Z]]  C  S[Y[Z]]. 

The  converse  operation  would  be  to  make  the  domains  X  and  Y 
larger.  For  example,  we  might  want  to  derive  E[1,2]  C  S[1,2] 
from  B£  1  ]  C  S[  1  ]  and  P.[  2  ]  C  S[2].  This  is  not  a  sound  inference, 
however,  as  can  be  seen  by  the  following  simple  example: 


(Ill,  5)  132 


E  S 

0~T  0  2 

3  1 

Intuitively,  the  reason  this  fails  is  that  given  a  tuple  (x,y)  in 
E  (str) ,  the  tuple  in  S (str)  that  has  the  x  in  domain  1  need  not 
be  the  same  as  the  tuple  which  has  the  y  in  domain  2.  Even  if 
the  two  sets  of  domains  overlap,  the  inference  is  still  invalid: 
In  the  example  below,  E[ 1  ,2  ]  C  S[1,2]  and  B[2,3]  C  S[2,3]  are 
true,  but  B[1,2,3]  C  S[ 1,2,3]  is  false: 

P  S 

0  12  0  13 

4  1  2 

But  suppose  domain  2  is  a  key  of  S,  i.e.,  the  FDs  S:?->1  and 
S:2— >3  are  valid.  (The  above  example  violates  these  FDs,  and  so 
does  not  represent  an  allowable  state™)  Then  given  a  tuple 
(x,y,z)  of  R,  the  only  way  to  have  a  tuple  tl  in  S  with  x  in 
domain  1  and  y  in  domain  2,  and  a  tuple  t2  in  S  with  y  in  domain 
2  and  z  in  domain  3,  is  for  tl  and  t2  to  be  the  same  tuple.  This 
means  the  SS  E[ 1,2,3]  C  S[ 1,2,3]  would  be  valid.  The 
generalization  of  this  rule  is  the  following: 

if  P.[X1]  C  S[  Y1  ]  and  E[X2]  C  S[Y2]  and  Y1*Y2->S, 
then  R[X1+X2]  C  S[  Y1  +  Y2], 

In  the  example  above  we  see  that  the  FDs  2->1  and  2— >3  on  S 
were  '•inherited"  by  H.  In  general,  if  all  of  the  domains  of  ar. 

FE  or  EQ  are  included  in  a  set  Y  of  domains  of  S,  then  the  FD  or 
EQ  (suitably  renamed)  will  also  hold  in  P.  whenever  an  SS  R[  X]  C 
S[Y]  is  valid.  For  if  we  have  a  set  of  tuples  in  which  an 
equality  (DEQ  or  VEQ)  holds,  that  equality  will  hold  in  any 
subset  of  the  tuple  set.  For  FDs,  we  note  that  functional 
relationships  cannot  be  destroyed  by  making  a  tuple  set  smaller. 
The  rules  for  inheriting  FDs  and  EC.s  are  therefore: 

if  R[X  ]  C  S[  Y  ],  then 

if  S:  Y[  XI  ]=Y[  Y1  ] ,  then  E:X[X1  ]=X[  Y1  ]; 

if  S: Y[ XI  ]=V,  then  R:X[X1]=V,  and 

if  S:Y[  Z  3— >Y[  A  ],  then  R:  X[  Z  ]— >X[  A  ]. 


(Ill, 5)  133 


For  convenience,  the  notation  names  the  FDs  and  EQs  relative  to  X 
and  Y.  This  means,  for  example,  that  the  inference 

P.[  2,3,1]  C  S[5,1,8]  and  S:5=8,  therefore  B:2=1 
can  be  written 

R£  2,3,1]  C  S[  5 , 1 , 8  ]  and  S:  £  5 , 1  ,8  ]£  1  ]=[  5,  1,  8  ]£  3  ]  therefore 
2:[  2,3,1  ][  1  ]=[  2,3,  1  ]£3  ]. 

We  include  one  more  rule  for  generating  new  constraints.  If 
the  EQ  X=Y  is  valid  in  B,  it  is  clear  that  we  can  write  B[X]  C 
B[  Y  ]  and  E[Y]  C  B[X].  The  converse  does  not  hold.  (Take,  for 
example,  the  tuple  set  { (0, 1)  , ( T, 0) } .  The  projection  on  either 
domain  is  (0,1),  but  the  EQ  1=2  does  not  hold.) 

The  above  discussion  presented  some  rules  governing  the 
interaction  of  FDs,  EQs  and  SSs.  Our  goal  is  to  derive  a  set  of 
inference  rules  for  this  expanded  set  of  constraints.  When  we 
were  dealing  only  with  FDs  and  EQs,  we  were  able  to  define  a 
closure  operator  which  worked  only  on  constraints  of  a  single 
expression.  This  was  because  FDs  and  EQs  refer  only  to  a  single 
relation.  Subset  constraints  refer  to  two  relations, and  this 
interaction  makes  the  previous  reduction  impossible.  This  means 
that  rather  chan  computing  sets  Drv  (e)  of  constraints  valid  for  a 
particular  expression  e,  an  algorithm  must,  at  first  glance, 
perform  a  more  global  calculation-  The  following  example  will 
illustrate  this  point: 

si:  B  (2)  ,  S  (4)  ,  T  (4) 

S[  1  ]  C  T[  4  ] 
s2:  U  ( 6)  ,  V  (2) 

0[  3  ]  C  V[  2  ] 

m:  U  =  R[2=3]S,  V  =  (R+T)[1,4] 

By  substituting  the  mapping  expressions  for  U  and  V  into  the 
constraint  in  s2,  we  get  the  following  expression  to  prove: 


(R[2-3]S)£3  ]  C  (B*T)  [  1  ,4  ][  2], 


(Ill, 5)  134 


If  we  defined  derivation  sets  Drv(e)  on  single  expressions,  then 
the  above  constraint  would  be  in  both  Drv(R[2=3]S)  and 
Drv  (  (R+T)  [  1 , 4  ])  „  The  rules  for  generating  the  sets  would, 
however,  be  open  ended  in  that  th-ey  would  also  generate  in 
Erv  (  (R+T)  [  1 , 4  ])  ,  for  example,  the  following  SSs: 

K[2=1](R[2=3]£)[  5]  C  (R  +  T)  [  1  , 4  ][  2  ] 

R[  2=  1  ]  (R[  2  =  1  ]  ( R[  2=  3  ]S)  )  [  7]  C  (R*T)  [  1  ,4  ][  2  ] 

R£  2=  1  ]  (R[  2=1  ]  (R[  2=  1  ]  (R[  2=3  ]S)  )  )  [  9]  C  (R+T)  [1,4  ][  2  ] 

etc . 

We  cannot  allow  the  rules  to  freely  generate  new  SSs  with  ever 
larger  expressions  on  their  left-  or  right-hand  side.  The  proper 
way  to  proceed  is  to  start  with  a  set  S  of  constraints  where  each 
constraint  is  labelled  with  the  expression { s)  it  applies  to. 

Ihen  all  constraints  applicable  only  to  the  already  named 
expressions  are  generated.  Next,  a  set  of  constraints  is 
generated  which  refers  to  expressions  derivable  from  the  previous 
set  by  the  application  of  only  one  relational  algebra  operator. 

We  continue  this  procedure  as  long  as  necessary.  The  number  of 
times  this  needs  to  be  done  is  determined  by  the  largest 
expressions  in  schema  s2.  In  the  example  we  have  been  using,  we 
would  need  three  iterations: 

zeroth:  S[ 1  ]  C  T[ 4  ] 

first:  (R[ 2=3  ]S) [ 3  ]  C  T[ 4  ]  or 

S[1]  C  (R  +  T)  [4  ] 

second:  (R[  2=3  ]S) [ 3  ]  C  (R+T)  £4]  or 

S[1  ]  c  (R+T)[  1 , 4  ][  2  ] 
third:  (R[  2=3  ]S)  [  3  ]  C  (R+T)  [  1  ,4  ]£  2  j. 

The  next  definitions  formally  present  the  procedures  just 
discussed.  The  first  lists  the  inference  rules.  These  consist 
of  the  ones  from  Definition  4.11  for  FDs  and  EQs  with 
modifications,  and  also  of  the  ones  for  SSs  informally  described 
previously  in  this  section.  The  second  definition  shows  how  to 
use  the  rules  to  build  new  constraint  sets  on  larger  expr essions. 

Definition  5.4.  Let  s  €  RSch 1 ,  and  let  S  be  a  set  of  constraints 
on  expressions  over  s.  The  closure  Cl*  (S)  is  defined  by  the 
following  inductive  rules: 


(I  II, 5)  135 


[  1  ]  SC  Cl'  (S)  . 

[2]  e:  X-Y  0  Cl'  (S)  =>  e:Y=X  €  Cl'  (S ) , 

[3]  e:X=Y,  e :  Y- Z  €  Cl*  (S)  =>  e:X=Z  0  Cl»  (S)  . 

[  4]  e:X=Y,  e : Y=V  €  Cl'  (S)  =>  e:X=V  6  Cl '  (S)  . 

[5]  e ; X=  V ,  e:Y=V  €  Cl'  (S)  =>  e:X=Y  6  Cl'  (S)  . 

[6]  if  e  appears  in  S,  then  e:X=X  6  Cl'  (S). 

£7]  e: X~Y  €  Cl'  (S)  =>  e:id:X->Y  0  Cl»  (S)  ,  where  id  is  X->Y. 

[8]  e: Xf  V  €  Cl'  (S)  =>  e  :id  :0—>X  e  Cl'{S),  where  id  is  '  V •  -> X, 

[9]  e  :id  1 :  Z— >A  1  ,  e:id2:Z->A2  0  Cl »  (S)  •  Rdc  (id  1 )  =Edc  (id  2)  => 

e: A1=A2  0  Cl' (S) . 

[10]  e : id : Z— >A  €  Cl'(S)  =>  e:id:Z+X->A  0  Cl*  { S ) . 

[11]  e:  id  1 :  Z— >A ,  e:id2:A  +  X->E  €  Cl '  (S)  =>  e:id3:Z  +  X->B  0  Cl'  (S)  , 

where  id3  is  obtained  from  id2  by  merging  a  copy  of  id  1  to 
every  leaf  of  id2  labelled  A.. 

[12]  e  1[  X  ]  C  e2[Y],  e2[Y]  C  e3[  Z  ]  0  Cl'(S)  =>  e1[X]  C  e3[Z]  0 

Cl«  (£)  . 

[13]  e1[  X]  C  e2[  Y 1  6  Cl*  (S)  =>  e  1[  X[  Z  1  ]  C  e2[Y[Z  ]]  €  Cl*  («)  . 

[14]  e1[  XI  ]  C  e2[Y1j#  e1[X2]  C  e^TY2],  Y1*Y2->e2  €  Cl’  (S)  => 

e1[X1  +  X2]  C  e2  [  Y1+  Y2  ]  0  Cl '  (S)  ,  where  Y1*Y2->e2  means  that 
e2:Y1*Y2— >d  €  Cl'  (S)  for  each  domain  d  of  e2. 

[15]  e1[  X  ]  C  e2[  Y  ],  e2:Y[U]=Y[W]  0  Cl'(S)  =>  e1:X[U]=X[W]  0 

Cl'  (S)  . 

[  16  ]  e  1[  X  ]  C  e2[  Y  ],  e2:  Y[  U]=V  0  Cl'  (S)  =>  el  :X[U  ]^V  0  Cl  »  (S)  . 

[17]  e1[  X  ]  C  e2[Y]f  e2:  Y[  Z  ]— >Y[  A  ]  0  Cl'  (S)  =>  e  1 :  X[  Z  ]->X[  Z  ]  0 

Cl'  (S)  . 

[18]  el :  X=Y  €  C1'<S)  =>  e1[X]  C  e1[Y]. 

D 

Definition  5.5.  let  S  be  a  set  of  marked  constraints.  The 
constraint  successor  set  Cons ucc (S)  is  defined: 

ConSucc  (S)  =  Cl'  (S+S  ')  , 

where  S'  is  defined  by  the  following  rules  in  which  c  is  an  I’D  or 
an  EQ : 

[la]  If  e:c  e  S  ard  every  domain  appearing  in  c  is  contained  in 
U,  then  e[0]:c*  0  S'  where  c'  is  obtained  from  c  by 
replacing  each  domain  d  by  n  where  U[n]=d. 

[  1b  ]  If  el [ X  ]  C  e2[  Y  ]  €  S  and  X  C  U,  then  e1[  0  J[  X'  ]  C  e2[  Y  ]  0 
S* ,  where  0[ X'  ]  =  X. 


(Ill,  5)  136 

[1c]  If  e1[  X  ]  C  e2[Y]  €  S  and  Y  C  U,  then  e1[X]  C  e2[U][  Y'  ]  €  S' 
where  U[  Y '  ]  =  Y. 

[2a]  If  e  is  any  expression  in  S,  then  e[U=W]:U=W  €  S'. 

[2b]  If  e: c  €  S,  then  e[U=W]:c  €  S'. 

[2c]  If  e  1[  X  ]  C  e2[  Y  ]  €  S,  then  e1[U=W][X]  C  e2[Y]  €  S'. 

[  2d  J  If  e1[  X  ]  C  e2[  Y  ]  €  S  and  e2:U=W  e  S,  then  e:[X]  C 

e2[U=W][Y]  €  S'. 

[  2e  ]  If  e1[  X  ]  C  e2[Y]  €  S,  then  e1[U=W][X]  C  e2[U=W][Y]  €  S'. 

[3a]  If  e  is  any  expression  in  S,  then  e[U=V]:UfV  €  S'. 

[3b]  If  e: c  €  S,  then  e[U=V]:c  6  S'. 

[3c]  If  e  1[  X  ]  C  e2[  Y  ]  0  S,  then  e1[U=V][X]  C  e2[Y]  €  S'. 

[3d]  If  e  1  [  X  ]  C  e2[  Y  ]  0  S  and  e2:U=V  0  S,  then  e1[X]  C 

e2[  U=V  ][  Y  ]  €  S'. 

[4a]  If  e1:c  €  S,  then  e1fie2:c  e  S'. 

[4b]  If  e2:c  €  S,  then  e1fie2:c'  6  S'f  where  c'  is  obtained  from 

c  by  replacing  each  domain  d  by  d+deg(e1). 

[4c]  If  e1[X]  C  e2[Y]  €  S  and  e3  appears  in  S,  then  (e1fie3)[X]  C 

e2[  Y  ]  6  S'  and  (e3tte1)[X»]  C  e2[Y]  0  S',  where 

X*  =deg (e3) +X. 

[  4d  ]  If  e1[X  ]  C  e2[  Y  ]  €  S  and  e3  appears  in  S,  then  e1[X]  C 
(e  28e3)  [  Y  ]  €  S'  and  e1[X]  C  (e3®e2)[Y']  €  S',  where 
Y'=deg(e3)  ^Y. 

[5a]  If  c'  is  an  EQ  and  e1:c'  6  S  and  e2:c'  0  S,  then  e1-»-e2:c'  € 

S'  . 

[5b]  If  e1:id1:Z— >A  €  S  and  e2:id2:Z— >A  €  S  and  Rdc(idl)  = 

Rdc  (id2)  ,  then  e  1+  e2 :  id  1 :  Z— >  A  €  S'. 

[5c]  If  e1[X]  C  e2[Y]  €  S  and  e3  appears  in  S,  then  e1[X]  C 
(e 2+e3)  [  Y  ]  €  S'. 

[5d]  If  e1[  X  ]  C  e2[  Y  ]  €  S  and  e3[  X  ]  C  e2[  Y  ]  €  S,  then  (e1+e3)[X] 
C  e2[Y  ]  6  S'.. 

[6a]  If  e1:c  6  S  and  e2  appears  in  S,  then  el— e2:c  €  S'. 

[6b]  If  e1[X  ]  C  e2[Y]  €  S  and  e3  appears  in  S,  then  (el— e3)[X]  6 
e2[Y  ]. 

D 

If  we  start  with  S  equal  to  the  relations  and  constraints 

listed  in  the  schema,  then  ConSucc(S)  is  the  set  of  derivable 

constraints  on  expressions  involving  one  or  zero  operators.  To 

find  out  if  a  constraint  e:Z->A,  e:X=Y,  e:X=V  or  e1[X]  C  e2[Y]  is 


(111,5)  137 


valid,  we  apply  the  ConSucc  operator  to  the  schema  constraints  k 
times,  where  k  is  the  number  of  relational  algebra  operators  in 
the  given  constraints  and  see  if  the  desired  constraint  is  in  the 
result.  The  soundness  of  the  rules  given  above  is  expressed  in 
the  next  theorem  by  stating  that  every  application  of  ConSucc 
produces  valid  constraints: 

Theorem  5.11.  Let  s  6  RSch*.  Let  S  be  a  set  of  valid 
constraints  on  members  of  Exp(s).  Every  set  in  the  sequence 
S0=C1*  (S)  S1=ConSucc  (SO)  ,  S2=ConSucc  (S 1)  ,  S3=Con Succ (S2)  ,  ... 
is  a  set  of  valid  constraints. 

Proof.  (omitted) 

□ 

In  contrast  to  the  rules  given  for  FDs  and  EQs  only,  the  rules 
given  in  this  section  are  bottom-up  in  nature.  That  is,  they 
start  with  the  constraints  in  the  schema  on  base  relations;  they 
derive  all  constraints  on  expressions  with  one  operator,  then  two 
operators,  and  so  on.  It  would  be  more  desirable  to  work  from 
the  top  down  as  we  did  in  defining  Drv:  Decide  if,  say,  e1[X]  C 
e2[Y]  is  valid  by  looking  at  the  valid  constraints  on 
subexpressions  of  el  and  e2. 

However,  it  is  not  clear  that  constraints  on  "unreferenced" 
expressions  can  be  ignored.  For  example,  suppose  the  following 
transitivity  rule  is  used: 

e1[  X  ]  C  e2[  Y  ]  and  e2[Y]  C  e3[Z], 
therefore  e1[X]  C  e3[Z  ]. 

If  expression  e2  is  not  a  subexpression  of  el  or  e3,  how  do  w e 
know  that  we  can  still  derive  the  SS  e1[X]  C  e3[Z]  without  "going 
through"  e2[Y]?  What  we  are  asking  is  the  the  following:  Can 
this  application  of  the  transitivity  rule  be  replaced  by  one  or 
more  applications  of  transitivity  on  SSs  of  base  relations?  In 
ether  words,  if  all  possible  SSs  are  generated  on  the  base 
relations,  and  then  only  the  relevant  relations  are  retained,  can 
we  still  generate  all  relevant  constraints  that  ConSucc  generates 
starting  with  all  constraints? 


(Ill, 5)  138 


Let  us  give  a  more  detailed  example.  Suppose  we  are  given  the 
schemas: 


s:  R  (3)  ,  S(3)  ,  T  ( 3)  ,  W(3) 

R[1,2]  C  S[  1 ,2  ], 

S[1,2]  C  T[  1  ,2  ] 

c:  (R[3=1]W)[1,2]  C  /T[3-1]W)[  1,2] 

A  derivation  of  this  SS  might  proceed  as  follows: 


(1) 

R[  1,2]  C  SC  1,2  1 

in  s  1 

(2) 

(Et  3  =  1  ]W)[1,2]  C 

(S[3-1]W)  [1,2] 

(1)  and 

rules 

[  2e, 4c , 4d  ] 

(3) 

S[  1,21  C  T[  1 ,2  ] 

in  s  1 

(4) 

(SC  3  =  1  ]W) C 1,2]  C 

(T[3=1]W)  [1,2  J 

(3)  and 

rules 

[ 2e, 4c , 4d ] 

(5) 

(RC  3=1  ]W)[  1,2]  C 

(T[3=1]W)  [1,2] 

(2)  ,  (4) 

and  rule  [12] 

The  last  line  was  derived  by  transitivity  using  the  expression 
S[3=1]W  which  is  not  a  subexpression  of  any  part  of  c.  Note, 
however,  that  there  is  an  alternate  derivation  which  uses 
transitivity  at  the  start  and  does  not  use  these  "foreign" 
expressions: 


(D 

R[  1,2  ]  C  S[ 1,2  ] 

in  s  1 

(2) 

S[  1,2]  C  T[  1,2] 

in  si 

(3) 

R[  1,2]  C  T[  1,2] 

(1),  (2)  and 

r  ul  e  [12] 

(4) 

(RC  3=1  JW)  [  1, 2]  C  (T[  3~  1  ]W)  [  1,2] 

(3)  and  rule 

[  2e  ,4c ,  4d  ] 

We  define  below  a  second  version  of  the  ConSucc  operator,  this 
time  with  two  arguments.  The  new  argument  is  a  set  of 
expressions,  and  the  operator  adds  new  constraints  only  if  the 
new  constraint  is  a  subexpression  of  some  element  of  the  second 
argument-.  We  will  then  claim  that  all  of  the  desired  constraints 
are  still  generated. 

Definition  5..  6.  Let  E  be  a  set  of  expressions  and  let  S  be  a  set 
of  constraints.  Then  the  limited  constraint  successor  set 
ConSuccJ (S ,E)  is  defined: 

ConSuccI  (S  ,E)  =  Cl'(S+S1'), 
where  SI '  =  E*«S * , 

with  S*  as  in  Definition  4. 16  and  where  E*  is  the  set  of  all 
possible  constraints  labelled  by  subexpressions  of  members  of  E. 


(Ill, 5)  139 


(Writing  E*«S'  is  shorthand  for  saying  that  the  only  new 
constraints  generated  are  ones  labelled  by  subexpressions  of  E. ) 

0 

(Note:  When  the  generation  of  new  constraints  is  limited  by  the 

set  E  as  above,  we  can  generate  all  the  constraints  on  E  all  at 
once  rather  than  in  successive  stages  as  ConSuccI  does.  That  is, 
we  could  define  an  operator  'Close'  with  one  set  of  inductive 
rules  such  that  Close  (E)  contains  all  derivable  constraints  on 
(sub) expressions  of  E.  The  rules  would  basically  be  the  union  of 
the  rules  for  Cl'  and  ConSucc..  However,  in  order  to  prove  the 
equivalence  of  the  two  procedures,  it  is  advantageous  to  keep  as 
much  compatibility  as  possible.) 

As  we  noted  above,  we  need  to  use  SSs  on  all  relations  in  the 
schema  only  once.  Thereafter  we  can  use  the  limited  constraint 
successor.  In  the  next  theorem  we  repeatedly  apply  the  ConSuccI 
operator  starting  with  the  closure  of  the  base  constraints,  and 
we  claim  the  results  are  the  same  as  when  ConSucc  is  applied. 

Proposition  5.12..  Let  s  e  RSch'.  Let  S  be  a  set  of  constraints 
on  members  of  Exp  (s)  .  Let  E  be  the  set  of  all  expressions 
occurring  in  S.  Then  every  set  in  the  sequence  U0=C1'  (S)  , 
U1=ConSucc1  (UO  ,E)  ,  U2=ConSucc  (U 1 ,  E)  ,  ...  is  a  set  of  valid 
const  rain  ts..  In  fact, 

Uk  =  E**Sk, 

where  Sk  is  defined  in  Theorem  5.11. 

Froof.  (omitted) 

0 


5. 4  Extensions  to  Many  Levels 

There  are  two  interpretations  we  can  give  to  the  necessity  of 
the  restriction  of  the  relational  algebra  syntax  needed  to  prove 
completeness  of  Drv.  We  may  take  the  view  that  only  expressions 
conforming  to  syntax  III  will  be  considered  for  possible  state 
mappings;  that  is,  the  operator  Drv  will  only  be  applied  to 
syntax-II  expressions.  Another  point  of  view  is  that  the 
equivalences  of  Theorem  5.4  are  incorporated  into  the  Drv 


(Ill, 5)  140 


algorithm.  Then  given  any  expression  (according  to  syntax  II), 
the  algoritnm  will  transform  the  expression  into  an  equivalent 
syntax-III  form  and  then  calculate  the  constraints  on  the 
equivalent  form. 

Whichever  point  of  view  we  take,  we  find  that  the 
constructions  are  inadequate  when  another  level  of  mapping  is 
introduced..  We  repeat  the  example  preceding  Theorem  5.6: 

si:  R  ( 3)  ,  F. :  1 ,  2— >  3 ,  R:  3— >2 
s2 :  3(2)  ,  T  (2) 
ml :  S  =  R[  1,3],  T  =  R[ 2,3  ] 
s3  :  U  ( 3) 

m2:  U  =  (S[ 2=2  ]T)[ 1,3,4  ] 

If  we  apply  the  Drv  operator  to  si  and  ml,  we  will  get  the  single 
constraint  T: 2— >1  valid  in  s2.  This  is  correct  and  complete; 
there  are  no  other  FDs  valid  on  base  relations  of  s2.  Nov  if  we 
apply  the  Drv  operator  to  s2  and  m2,  we  find  only  the  constraint 
U:3— >2  valid  in  s3.  Yet  as  we  know,  the  FD  U:1,2— >3  is  also 
valid . 

The  problem  is  that  the  rules  for  calculating  constraints 
assume  that  the  "base"  relations  are  independent  and  really  at 
"ground  level"..  When  we  generate  a  view  on  top  of  a  view,  these 
assumptions  fail  to  held.  From  the  first  point  of  view  mentioned 
above,  the  view  on  top  of  a  view  can  produce  an  illegal  (non- 
syntax-II)  expression.  From  the  other  point  of  view,  the 
intermediate  view  blocks  the  transformation  of  expressions  into 
syntax-II  form.  Since  it  has  been  proposed  that  multiple  layers 
of  external  schemas  should  be  allowed[ ANS 177  ],  we  need  to  extend 
the  formalism  to  keep  track  of  "hidden"  FDs. 

The  first  step  is  to  allow  constraints  in  a  schema  which  do 
net  refer  to  base  relations.  For  example,  in  schema  s2  above  we 
could  put: 

s2:  S(2)  ,  T  (2) 

T: 2— > 1 ,  S[ 2=2 ]T : 1 , 3— >4 

The  notation  is  therefore  no  problem,  but  we  must  postulate  rules 
tc  generate  these  hidden  FDs. 


(I II ,5 )  141 


The  example  we  have  been  using  to  illustrate  problems  with 
completeness  is  the  simplest  case  of  a  non-loss  join  of  relation 
projections-  Necessary  and  sufficient  conditions  for  a  join  to 
be  non-loss  have  recently  been  published^ AhBU  ]-  We  will  outline 
below  how  the  algorithms  for  calculating  non-loss  joins  can  be 
used  in  the  present  context. 

Definition  5.7..  let  s  €  ESch  ;  let  e  -  Exp(s)  ,  and  let  XI  and  X2 
be  subsets  of  {1r..,.rn}  such  that  X1+X2  =  (1,..-,n}«  Let 
jc(X1,X2)  be  the  join-condition  equating  the  duplicated  domains 
of  XI  and  X2,  and  let  p(X1,X2)  be  the  projection  which  will 
eliminate  the  duplicated  domains  from  e[  XI ][  jc (XI  ,X2)  ]e[X2]  and 
put  the  domains  in  their  original  order-  The  join 
e[  XI  ][  jc  (X 1 ,  X  2)  ]e[X2]  is  non-  loss  if  for  every  state  st  of  s , 

<e[X1  ][  jc(X1,X2)  ]e[X2])[p(X1,X2)  ]  (st)  =  e  (st)  . 

a 

The  following  theorem  is  an  adaptation  to  our  notation  of 
Corollary  1  in  [AhBU]. 

Theorem  5.13.  The  join  e[ X 1 ][ jc ( XI ,X2)  ]e[ X2  ]  is  lossless  if  and 
only  if  e :X 1«X2— >X1  or  e:X1«X2->X2  is  valid. 

D 

Using  this  theorem,  we  will  modify  the  ConSucc  operator  to  take 
into  account  non-loss  joins. 

Definition  5.8.  Let  s  €  ESch',  and  let  S  be  a  set  of  constraints 
on  elements  of  Exp(s)  .  The  set  ConSucc2(S)  is  defined  by 

ConSucc2  (S)  =  CIMS+SZ1)/ 
where  S2'  is  defined  by  the  following  rules: 

[la]  to  [6b]  are  as  in  Definition  5.5. 

[8]  if  e  appears  in  s  and  for  some  XI,  X2,  e[  X 1  ][  jc  (X 1 ,  X  2)  ]e[X2] 
is  Don-loss,  then  for  every  constraint  e:c  in  S  the 
constraint  e’:c*  is  in  S21,  where  e»  is  the  above  join. 

D 

Actually,  joins  on  two  projections  are  not  enough.  for  example, 
suppose  wo  have  the  situation: 


(Ill,  5)  142 


sis  R(6),  1— >3,  2— >3,  3->4, 

5->4,  5— >1 ,  5— >2 
s2 :  R1  (3)  ,  R2(3)  ,  R3(2),R4(3) 
r:  R1  =  R[2,5,6],  R2  =  F[1,2,5] 

R 3  =  R[  1,4  ],  R 4  =  R[3,5,6  ] 

If  on  top  of  s 2  we  define 

U  =  (<<R1[  1,2=2, 3  ]r^)[4=1  jR3)[2=2]E4>[4,  1,  9,8,3], 

we  will  have  R  (st)  =  U(st)  for  all  states  st.  (This  is  example  2 
in  [AhBu].)  We  will  not  do  so  here,  but  extensions  to  Definition 
5.8  and  Theorem  5.13  for  n-way  joins  can  be  made  making  use  of 
Theorem  3  in  [AhBu]. 

5.5.  Summary  and  Conclusions 

Since  it  is  desirable  that  structure  mappings  map  states  to 
states,  the  problem  of  deciding  when  a  structure  mapping  has  this 
property  was  studied  in  Part  III  of  this  thesis.  This  problem 
was  reduced  to  the  problem  of  deciding  the  validity  of 
constraints  on  relational  algebra  expressions  within  a  schema. 

In  Chapter  4  we  formulated  a  set  of  rules  for  determining  this 
property,  and  we  showed  that  the  rules  were  sound.  In  the 
present  chapter  we  studied  the  more  difficult  problem  of  the 
completeness  of  the  derivation  rules.  In  Section  1  we  saw  that 
this  problem  is  unsolvable  when  general  relational  algebra  is 
used.  The  first  restriction  we  made  was  to  eliminate  the  set 
difference  operator  from  the  algebra.  The  formulas  in  this  case 
were  still  incomplete,  but  a  further  modification  was  made  to  the 
mapping  language  requiring  that  cross  products  (joins)  occur 
in  expressions  before  projections.  This  did  not  affect  the  power 
of  the  language,  and  the  rules  for  this  case  were  shown  to  be 
complete  in  Section  2.  The  proof  of  completeness  was  a  difficult 
task,  and  a  large  portion  of  the  chapter  was  devoted  to  it. 

Section  5.3  looked  at  how  the  derivation  rules  can  be  extended 
to  include  subset  constraints.  Subset  constraints  are  important 
for  modelling  many  real-world  situations,  and  also  for  modelling 
the  features  of  oth  r  data  models  (see  Chapter  8) .  We  formulated 
extensions  of  the  derivation  rules  which  involved  subset 


(Ill, 5)  143 


constraints.  The  derivation  rules  were  complicated  Lv  the  fact 
that  subset  constraints  refer  to  two  expressions.  We  stated 
without  proof  a  proposition  which  claims  that  this  complication 
is  not  serious. 

In  the  last  section,  we  discussed  why  the  second  restriction 
tc  the  mapping  language,  that  joins  occur  first,  would  need  to  be 
relaxed  in  order  to  take  into  account  multiple  levels  of 
mappings.  Some  additional  rules  for  calculating  valid 
constraints  with  an  arbitrary  order  of  relational  algebra 
operations  were  proposed. 

Besides  presenting  particular  algorithms  for  the  particular 
models  studied,  this  chapter  and  the  revious  one  have  also  shewn 
that  special  care  must  be  taken  when  uesigning  a  mapping 
language.  While  query  languages  with  the  power  of  Predicate 
Calculus  of  first  and  higher  orders  may  be  desirable,  such  power 
causes  problems  when  rhe  languages  arc  used  tc  define  structure 
mappings.  In  ANSI/SPAP.C  terminology,  wc  want  the  schema 
processors  in  the  framework  (see  Section  1-4)  to  be  able  to  give 
"no"  answers  with  the  same  level  of  confidence  as  they  give  "yes" 
answers.  That  is,  when  an  external  schema,  say,  along  with,  its 
mapping  to  the  conceptual  schema  is  submitted  to  the  external 
schema  processor,  an  acceptance  should  really  mean  that  the 
mapping  is  consistent,  but  a  rejection  should  just  as  well  really 
mean  that  the  mapping  is  inconsistent.  The  former  property 
corresponds  to  soundness  of  the  rules  used,  the  latter  to 
completeness  of  the  rules.  Thus  the  investigations  of  these  two 
chapters  do  not  have  merely  academic  interest;  many  mapping 
problems  illustrated  in  Chapter  i  require  algorithms  which  are 
both  sound  and  complete. 


(TV, 6)  144 


PART  IV 


Relat ion a 1  Operation  Mappings 


C HAPTER  6 

Operations  and  Operation  Mappings 


In  Part  III  we  studied  some  important  properties  of  relational 
structure  mappings..  In  Part  IV  we  look  at  operation  mappings.. 

In  the  present,  sixth  chapter,  we  will  address  two  problems. 
First,  there  is  the  problem  of  defining  the  semantics  of  update 
operations  in  the  data  model*  We  will  argue  that  the  traditional 
notions  of  update  semantics  are  too  restrictive  in  the  context  of 
mappings,  and  we  will  suggest  proper  generalizations.  Second,  we 
will  attack  the  problem  of  formulating  rules  for  recognizing 
desirable  properties  of  operation  mappings*  Like  the  analogous 
problem  for  structure  mappings,  this  problem  is  important  because 
such  rules  would  be  essential  components  of  ANSI/SPAEC-like 
mapping  processors*  For  a  weak  version  of  update  semantics  we 
find  that  the  formulation  of  sound  and  complete  rules  for 
recognizeing  a  correct  type  of  operation  mapping  is  possible,  and 
we  give  such  a  formulation. 

In  this  chapter  there  are  five  sections.  In  Section  1,  we 
introduce  the  set  of  operations  to  be  studied:  It  is  a  basic  set 
consisting  of  insertions  specifying  a  single  tuple  to  be  inserted 
and  deletions  specifying  a  number  of  tuples  to  be  deleted 

according  to  values  in  some  but  not  necessarily  all  domains.  We 

) 

give  a  number  of  possible  semantics  for  these  operations  and 
argue  that  each  is  plausible.  We  also  define  a  class  of 
operation  mappings.  We  make  these  mappings  as  simple  as  possible 
while  leaving  them  powerful  enough  to  be  interesting. 


(IV,  5)  145 


In  Section  2  ws  explain  how  our  generalized  semantics  fit  in 
with  operation  mappings.  The  problem  we  then  concentrate  on  is 
when  the  commutati.e  diagram  of  Definition  2.3  is  satisfied  given 
that  the  before-  and  af te r- struct ures  of  the  database  are  states. 
We  reduce  this  problem  to  a  problem  cf  determining  the  truth 
value  of  three-place  predicates  whose  arguments  consist  of  a 
sequence  of  operations  (right-hand-sides  of  operation  mappings) , 
a  relational  algebra  expression  and  a  tuple.  (This  reduction  is 
analogous  to  the  reduction  in  Part  III  of  the  problem  of 
determining  state  mappings  to  the  problem  of  calculating 
constraints  on  relational  algebra  expressions, ) 

Section  3  then  formulates  a  set  of  rules  for  the  predicates 
for  weak  semantics,  and  shows  that  they  are  sound.  Section  4 
shows  the  procedures  are  complete  if  set  difference  is  omitted. 
During  this  process  we  again  have  an  unsolvable  problem  when  the 
set  difference  operator  is  included  in  the  mapping  language. 

Section  4  summarizes  and  draws  conclusions  for  the  chapter. 

6.1.  Relational  Operations  and  Ope  ration  Mappings 

In  Chapter  4  we  defined  the  first  three  components  of  the  data 
model  RDm=  (RSch,RStr,RSt,RQ) .  Now  we  want  to  define  EQ. 

According  to  Definition  Z.  3,  given  s  €  E$ch,  an  element  q  of 
BQ  (s)  is  a  function  R  Str  (s) ->RStr  (s) .  There  is  no  result 
component  in  the  structures  of  this  model  as  was  mentioned  in 
Chapter  2;  we  wall  not  deal  with  retrieval  operations  in  this 
chapter.  We  will  define  insert  and  delete  operations.  Both 
types  of  operations  will  have  a  very  simple  syntax:  An  element 
of  RQ  (s)  will  specify  the  operation  (insert  or  delete),  the 
relation  operated  on  and  a  tuple  specification.  For  an 
insertion,  the  tuple  specification  is  the  tuple  itself.  In  other 
words,  one  insert  operation  inserts  one  tuple  into  the  target 
relation,  and  all  values  for  the  tuple  are  specified.  For 
example,  insert (R , iO,  2)  )  will  insert  (0,2)  into  R.  It  may  be 
useful  to  allow  null  values,  but  for  now  they  will  not  be  looked 

i 

at.  (For  inserting  into  a  projection  ,  it  may  be  nice  to  be  able 
to  use  null  values  in  the  projected  uomains.  However,  we  will 
simply  choose  some  arbitrary  positive  integer  for  the  projected 
domains. ) 


(IV, 6)  146 


In  a  deletion,  the  tuple  specification  gives  specific  values 
for  some  but  not  necessarily  all  domains.  The  tuples  deleted 
will  be  the  ones  whose  domain  values  match  the  specified  ones.. 
They  may  have  any  values  in  the  domains  not  specified-  For 
deletions,  then,  one  operation  may  cause  many  tuples  to  be 
deleted  from  the  target  relation.  For  example,  delete  (R,  (-,3 ,-) ) 
will  cause  all  tuples  which  have  a  *3*  in  the  second  domain  and 
arbitrary  values  in  the  other  domains  to  be  deleted  from  B. 

In  the  first  definition  of  this  chapter,  we  give  the  syntactic 
portion  of  the  definition  of  RQ;  the  functional  specification 
will  follow.. 

Definition  6,1.  The  set  Tup (n)  of  tuples  is  the  set  of  all  n- 
tuples  of  positive  integer  (symbols).  The  set  BTup(n)  of  tuples 
with  blanks  (b-tuples)  is  the  set  of  all  n-tuples  whose 
coordinates  are  positive  integer  (symbols)  or  blanks  (*— '). 

Let  s  €  RSch.  The  set  RQ (s)  of  operations  with  respect  to  s 
consists  of  all  elements  of.  the  form 

insert(R,w)  or  delete  (R,v  *)  , 
where  R  (n)  €  s,  w  6  Tup(n)  and  we  6  BTup(n)., 

0 

The  following  paragraphs  will  relate  these  operations  to  other 
types  of  operations. 

When  dealing  with  views,  if  we  want  to  delete  even  a  single 
tuple  from  a  projection,  we  will  tave  to  perform  a  bulk  delete 
from  the  unprojected  relation.  Take  the  following  example: 

si:  P  (3) 

s2  :  S  (  2) 
m:  S=E£  1,2] 

To  interpret  the  operation  delete  (S  ,  (5, 4)  )  requires  at  least  the 
operation  delete (R, (5, 4,—) )  since  if  only  one  tuple  is  left  in  R 
beginning  (5,4,.),  the  tuple  (5,4)  will  still  be  in  S„  This  is 
one  reason  b-tup]er  are  needed. 

Also,  with  our  deletes  we  can  represent  bulk  deletes  on 
equality  conditions  combined  arbitrarily  by  ands  and  ors.  For 
example,  suppose  we  are  interested  in  the  operation 


(IV,  6)  147 


delete  R  where  /P.  1=0  or  R.1=1)  and  P .  2=5  an<|  (R-  4=0  or 
R.  4=3)  . 

We  first  put  the  vhere-clause  into  disjunctive  normal  form: 

(R-  1=0  and  2.2=5  and  R  .4=0 )  or  (R.  1=C  and  R.  2=  5  and.  R.  4=3)  or 
( R - 1  =  1  an d  1-2=5  and  R„4=0)  or  'E. 1=1  and  E. 2=5  and 
E-  4=3)  . 

The  first  clause  corresponds  to  delete  (E, (0 ,5,-, 0) )  ,  the  second, 
third  and  fourth  to  delete  (R,  (0,5,—.,?))  ,  delete  (E ,  (1 , 5  ,-,  0)  )  ard 
delet e  (E ,  (f ,5 ,3 ) ) ,  respectively.  Then  the  original  delete  is 
equivalent  to 

delete  (R,  (0 ,5  0)  )  ;delete  (E,  (0,5,-,  3)  )  ;  delete  (R,  (1,5,-,0)  )  ; 

delete  (E,  (1,5,-, 3) )  . 

Certain  single  tuple  updates  can  be  expressed  as  a  delete  plus 
an  insert.  For  example, 

update  R  (0,1,2)  to  (1,1,2) 

is  simply 

delete  (E,  (0,1,2) )  ; insert (E,  (1,1,2)) 

However,  we  cannot  express  bulk  updates  or  updates  involving 
functions  such  as  '  ie. g. ,  increase  a? 1  saJaries  by  $500). 

In  summary,  we  ha\ e  defined  a  set  of  operations  which  are 
simple  but  which  represent  a  significant  portion  of  D ML 
functions. 

Now  we  want  tc  specify  what  kind  of  functions  (or  relations) 
EStr  (s)  — >RS tr  (s)  correspond  to  the  insert  and  delete  operations.. 

For  simplicity,  we  assumed  in  the  above  discussion  that 
operations  always  aid  the  traditional  thing:  insert(R,w)  inserts 
exactly  w  into  E  ard  a] ters  no  other  relation;  delete  (R,w) 
deletes  exactly  the  tuples  specified  by  w  from  E  and  alters  no 
ether  relation.  When  the  operations  ale  those  on  a  view,  it  ray 
be  impossible  to  make  these  assumptions.  Consider  the  following 
example: 

si  :  R  (2)  ,  S  (2) 
s2 :  T  ( 4) 


(IV, 6)  148 


m:  T  =  H£ 2=1  ]S 

let  st  be  the  state  given  by: 

R  S  T 

71  2~4  12  2  4 

4  0 

The  operation  insert  (T,  (3 , 2, 2  , ) )  ,  if  interpreted  as 
insert  (R,  (3,2) )  ; insert (S,  (2,1)),  will  produce  the  state: 

T 

12  2  4 
12  2  1 

3  2  2  4 
3  2  2  1 

Here  we  see  that  the  operation  of  inserting  one  tuple  into  a  view 
resulted  in  not  only  that  tuple  but  also  others  being  inserted 
into  the  view-  For  the  insert  of  (3, 2, 2,1)  into  T,  there  is  no 
way  to  avoid  generating  the  •'spurious”  tuples  without  also  making 
deletions  from  the  view.  (We  could  always  delete  (1,2, 2, 4)..) 

A  slightly  different  situation  occurs  when  interpreting  the 
operation  insert  (T,  (5 , 0, 0  ,8)  )  as  insert  (R,  (5,  0)  )  ;insert  (S ,  (0,  8)  )  . 
This  time,  the  resulting  state  is 

E  S  T 

~2  2~4  12  2  4 

40  08  4008 

5  0  5  0  0  8 

Again,  an  extra  tuple  has  appeared.  Note,  however,  that  if  we 
interpret  the  operation  as 

insert  (E,  (5,  0)  );  insert  (S,  (0 , 8) ); delete  (R ,  (4 ,0)  )  ,  then  the  result 
is  ar.  "exact''  insertion  into  T: 

R  S  T 

1  2  2  4  1  2  2  4 

50  08  5008 

Note,  however,  that  the  semantics  of  the  relations  may  make  the 
deletion  of  (4,0)  from  R  impossible,. 

What  this  example  has  illustrated  is  that  for  one  reason  or 
another,  operations  on  views  cannot  always  be  expected  to  produce 
the  traditional  results,  that  is,  results  in  which  an  insert 
operation  inserts  just  the  specified  tuples  and  in  which  a  delete 


R  S 


12  2  4 

4  0  2  1 

3  2 


(IV,  6)  149 


operation  deletes  just  the  specified  tuples.  (This  phenomenon 
can  also  result  from  other  causes  than  the  above  "connection 
trap".)  Thus  the  user  of  a  view  must  sometimes  lower  his 
expectations  and  accept  a  weaker,  nondeterministic  semantics  for 
the  insert  and  delete  operations. 

Before  we  present  the  semantics  for  EQ,  we  need  some 
constructs  involving  b-tuples„ 

A  b-tuple  can  be  naturally  associated  with  two  sets  of  tuples. 
The  first  is  the  set  of  fully  specified  tuples  to  which  the  fa- 
tuple  refers,  i.e.,  the  tuples  which  agree  with  the  b-tuple 
wherever  the  b-tuple  has  a  value.  This  agreement  relation  we  can 
formally  define  as: 

w<w*  if  and  only  if  for  each  domain  X, 

w[  X]  *  =>  w[  X]  =  w'[  X].. 

Examples  are  (1  ,2  ,3)  ,  ( 1 3)  <  ( 1 , 0,  3)  and 

(1*2, 5) <  (1 ,2,3)  ,  but  (i,-,3j*  (2,2,3)  and  (-,-, 0) %  (0, 0.  1) .  If  w 
is  a  member  of  BTup(n),  we  will  let  ##w  denote  the  set  of  fully 
specified  tuples  related  to  w,  i.e.,  ##w  =  {w*  €  Tup  (n)  :  w<w*)„ 

According  to  the  (so  far,  informal)  interpretation  of  deletion, 
the  operation  delete (R,w)  will  delete  the  set  ##w  from  the  state 
of  R. 

The  second  set  associated  with  a  b-tuple  is  similar  to  the  one 
just  defined.  Note  that  in  the  definition  of  the  relation  <,  it 
is  not  necessary  that  w*  be  a  fully  specified  tuple.  As  examples 
we  have  {-,-,-)<(-,  1 ,-)  *  (1 ,2  ,-)  <  (1 , 2,-)  and  (-,-,  6;  <  (6,-,  6)  ,  but 

(-,-,  0)  fc (-,-,— )  and  ( 1 ,-,  1)  $  ( 2,-,  1)  -  We  will  let  #w  denote  the 
set  of  such  b-tuples.  Formally,  #w  =  £w*  €  b-tuple  :  w<w*}„  The 
set  #w  of  b-tuples  related  to  w  will  help  to  relate  delete 
operations.  For  if  w<w*,  then  delete  (R,w)  deletes  at  least  the 
tuples  that  delete  (R,w»)  does.  In  the  following  definition,  we 
have  made  a  slight  generalization  in  allowing  a  set  S  in  place  of 
the  single  tuple  w. 

Definition  6.2.  Let  w  and  w*  be  in  BTup(n).  Then  w<w'  if  for 
every  coordinate  X  such  that  w[X]  is  not  blank,  w[X]=w'[X].  For 
any  set  S  C  BTup(n),  #S  is  the  set  {w*  €  BTup(n)  :  for  some  w  e 
S,  w<w'}  and  ##S  is  the  set  {w*  €  Tup  (n)  :  for  some  w  €  S,  w<w»}. 


(IV, 6)  150 


D 

With  the  above  definitions,  we  can  precisely  define  the 
semantics  of  insert  and  delete. 

We  recoqnize  two  types  of  characteristics  which  the  semantics 
may  have.  One  attribute  of  an  operation  semantics  is  whether  or 
not  operations  on  different  relations  are  Independent.  That  is, 
an  operation  op(R,w)  (where  op  is  insert  or  delete)  may  ha-re  no 
effect  on  other  relations  S  *  R ,  or  it  may  also  change  one  or 
more  relations  distinct  from  P.  Of  course,  no  effect  is  what  we 
would  expect,  but  as  with  the  previous  example  with  the 
connection  trap,  relations  which  are  views  sometimes  do  not 
behave  as  we  would  like  them  to.  The  following  example 
illustrates  this: 

si:  R  (  2)  ,  S  (2)  ,  T  (  2) 

s2  :  U  (2)  ,  V  (2) 
m:  U—  R  +S ,  V=S*T 
t:  insert  (U  ,w)  — >insert  (S  ,  w) 

By  translating  the  operation  insert  (U,w)  by  t,  the  insertion  will 
also  be  visible  in  the  view  relation  V.  Situations  like  this  may 
be  unavoidable,  and  the  semantics  should  make  provisions  for 
overlapping  views.  Hence,  we  should  speak  about  dependent  and 
independent  semantics.  Dependent  semantics  makes  no  statement 
about  "inter-relational  side-ef f ects"  (changes  to  non-target 
relations) ,  while  independent  semantics  specifies  that  there  are 
no  inter-relational  side- effects. 

The  other  important  attribute  of  the  semantics  has  to  do  with 
intra-rela tioraJ  side-effects  as  exemplified  by  connection  traps. 

What  we  will  call  strong  semantics  specifies  no  such  side- 
effects.  Namely,  if  R (st)  is  the  state  of  R  before  an  operation 
insert(R,w),  then  after  the  operation,  the  state  of  R  is 
R  (st)  -*■{»}.  After  an  operation  delete  (R,w),  the  state  of  R  will 
be  R  ( st)  —  ##  w.. 

When  intra-relationa 1  side-effects  are  caused  by  connection 
traps,  we  note  that,  with  insert,  the  side-effects  are  restricted 
to  the  insertion  of  extra  tuples,  and  with  deletions,  the  side- 
effects  involve  only  deletion  of  extra  tuples..  We  abstract  this 
behavior  in  the  definition  of  medium  semantics.  Here,  if  R  (st) 


(IV, 6)  151 


is  the  state  of  R  before  an  operation  insert(E,w),  then  medium 
semantics  specifies  that  R(st)+{w}  is  a  subset  of  (but  not 
necessarily  equal  to)  the  result.  This  requirement  means  that 
nothing  is  deleted  from  B,  but  that  extra  tuples  may  have  been 
inserted.  For  deletion,  the  requirement  of  medium  semantics  is 
that  after  the  operation  delete  (R,v),  the  state  of  B  is  a  subset 
of  P.  (st)— ##w.  Here,  nothing  can  be  inserted,  but  tuples  in 
excess  of  those  in  ##w  may  have  been  deleted. 

For  the  third  type  of  semantics,  note  that  the  least  an 
insertion  can  be  expected  to  do  is  to  insert  the  given  tuple. 
Similarly,  the  very  least  that  is  desired  of  a  deletion  is  that 
it  delete  the  referenced  tuples.  We  can  define  a  weak  semantics 
along  these  lines.  After  an  operation  insert  (E,w),  regardless  of 
the  state  of  B  before  the  operation,  we  require  that  v  be  an 
element  of  the  state  of  B.  For  the  operation  delete  (B,w),  weak 
semantics  will  require  that  no  element  of  ##w  is  an  element  of 
the  s*ate  of  B  after  the  operation. 

This  kind  of  semantics  could  occur,  for  example,  in  a  PLANNER-* 
like[  Hewi  ]  database.  An  ASSERT  command  will  insert  a  tuple  into 
some  predicate  (relation),  but  an  Antecedent  Theorem  might  use 
this  new  tuple  to  delete  other  tuples  from  the  database. 

In  dealing  with  problems  involving  medium  and  strong 
semantics,  which  have  a  stronger  tie  with  traditional  database 
management,  we  will  find  it  useful  to  first  deal  with  the  more 
primitive  weak  semantics. 

We  have  discussed  the  various  types  of  semantics  to  be 
investigated  in  relation  to  operation  mappings.  With  the 
semantics  having  two  "coordinates" ,  one  with  two  options,  the 
other  with  three,  we  are  in  reality  defining  six  variants  of  our 
data  model  B Dm.  Bathei  than  introduce  a  large  amount  of  notation 
to  distinguish  these  six  models,  we  will  speak  of  R Dm  as  a  single 
model  and  talk  about  different  possibilities  for  the  meaning  of 
the  operation  set  BQ. 

Definition  6.3.  RQ  has  independent  semantics  if  for  every  s  € 
RSch,  str  €  RStr(s),  relation  R(n)  6  s,  relation  S  €  s  distinct 
from  B,  w  €  Tup(n)  and  w*  6  BTup  (n)  : 


(IV, 6)  152 


S  (insert  (R,  w)  (str))  =  S(str)  and 
S  (delete  (R,w* )  (str  ) )  =  S(str). 

Otherwise  RQ  has  dependent  semantics. 

r Q  has  strong  semantics  if  for  every  s  6  RSch,  str  €  RStr(s)  , 
relatior  E  (n)  fc  s,  w  €  Tup(n)  and  w*  e  BTup  (n) 

R  (inserx  (R ,  w)  (str)  )  -  R(str)+£w}  and 
R (delete (R,w» )  (str))  =  R(str)-##w». 

RQ  has  medium  semantics  if  for  every  s  €  RSch,  str  €  RStr(s), 
R  (n)  €  s,  w  €  Tup(n)  and  w*  €  BTup(r.) 

R  (str)  +  {w}  C  R (insert  (R, w)  (str))  and 
P.  (delete  (R,w»)  (str))  C  R (str)— ##wf . 

RQ  has  weak  semantics  if  for  every  s  €  RSch,  str  6  RStr(s)  , 

R  (n)  €  s,  w  €  Tup(n)  and  w*  €  BTup(n) 

w  e  R  (insert  (P. ,  w)  (str)  )  and 
##w*  *  R  (delete  (R,  w* )  (str))  = 

(Note:  To  ease  the  notation  we  have  avoided  referring  to  the 
interpretations  in  a  structure  of  the  constant  symbols.  To  be 
precise,  the  set  ## w  should  be  replaced  by  the  set 
{  (c  (x  1)  .  -  ,c  (xn)  :  (x1,--.,xn)  6  ##w}  ,  where  (c(i)  :  i  €  Z+)  is 

the  constant  symbol  interpretation  in  str.) 

D 

We  have  defined  the  operation  component  RQ  of  the  data  model 
RDm,  and  now  we  want  to  define  the  operation  mapping  component 
RMg  of  Prim.  According  to  the  framework  presented  in  Chapter  2, 
if  t  6  RMq(s1,s2),  then  t  has  the  functionality 
BQ(s2)  x  EStr  (s  1)  —  >RQ  (s  1)  *.  As  we  saw  in  Section  6  of  that 
chapter,  this  functionality  can  be  achieved  with  an  abstract 
programming  language  where  operations  of  RQ(sl)  appear  as 
statements,  and  gueries  on  si  appear  in  expressions.  Consider 
the  following  example: 

si:  R(2),  S  (2) 
s2 :  T  ( 4) 

m:  T  =  R[ 2=1  ]S 


(IV,  6)  153 


t(q,st)  = 

if  q  is  insert (I, w) 

then  insert  (R  ,  w[  1 , 2 ])  jinsert  (S,w[  3,4  ]) 
else  if  g  is  delete  (T„w) 

then  delete  (R  ,w[  1 ,2  ])  ; 
if  EL  2=w[  2]]  (st)  =** 

then  delete  (S  ,w[  3  ,4  ]) 
fi 
fi 


fi 


Ihe  translation  t  contains  three  if -statements.  The  first  two 
simply  determine  whether  q  is  an  insert  or  a  delete  operation. 
Delete  operations  are  translated  by: 

delete  (E,  w[  1 , 2  ])  ; 

if  E[2fw[2  ]](st)=0 

then  deleters, w[  3,4  ])  . 

This  is  a  state  dependent  mapping  which  says:  "To  delete  w  from 
T,  first  delete  components  1  and  2  of  w  from  E;  then  if  there  are 
no  more  tuples  in  E  whose  second  component  equals  the  second 
component  of  w,  i.e. ,  if  v[3,4]  in  S  no  longer  has  anything  in  E 
with  which  to  join,  then  delete  also  w[3,4]  from  S. " 

Procedural  mappings  such  as  this  one  are  useful  and  powerful. 
For  example,  the  above  translation  is  similar  to  the  DELETE 
SELECTIVE  command  of  Codasyl£ Coda  ].  However,  the  dependence  of 
the  mapping  on  the  database  state  complicates  the  analysis  of 
these  mappings.  In  order  to  gain  an  initial  understanding  of 
operation  mappings,  it  is  advantageous  to  assume  no  dependence  on 
the  database  state.  Mathematically,  this  means  that  operation 
mappings  will  be  functions  EQ  (s2) — >EQ (si) *.  In  terms  of  language 
features,  state-independence  means  there  will  be  no  constructs  of 
the  ferm 


if  <yes-no  guery>  then  <db-interac tion> 

else  <db-interaction>  fi  or 

while  <yes-no  query>  do  <db-interaction>  od. 


(IV,6)  154 


A  mapping  may  still  be  dependent  on  the  operation,  the  relation 
and  the  tuple.  We  will  make  one  further  restriction,  that  there 
is  also  no  dependence  on  the  tuple.  This  means,  for  example, 
that  if  an  operation  mapping  t  defines  the  association: 

delete  (R,  (0, 1,  2) )  ->  delete  (S,  (0,2)  )  ;delete  (T,  (1  ,5)  ) 

then  it  will  also  define  the  association: 

delete  (E,  (6, 7, 8) )  ->  delete  (S  ,  (6  ,8) )  ;delete  (T,  (7,  5) )  „ 

This  leads  us  to  define  operation  mappings  to  be  of  the  form: 

op  (R , w)  ->  opl  (R 1 , w 1)  ;  ...  ;opn  (En ,wn)  , 

where  w,w1,«..,vn  are  tuples  of  variables  or  constants  such  as 
(v1,v2,5, v3). 

Given  an  operation  mapping  t  of  this  form,  we  need  to  specify 
the  element  t(g)  of  RQ(sl)*  given  g  in  RQ  (s2)  .  The  following 
example  will  shot*  how  this  happens: 

si:  R  (4) 
s2:  S  (3) 

ra:  S  =  R[  3='  6*  ][U,  1,2  ] 

t:  insert  (S,  (  vl,  0  ,  v2)  )  — >  insert  (R,  (0  ,v2 , 7 ,  vl  ) ) 

If  g  is  insert  (S,  (5,0 ,6) )  ,  then  t (g)  will  be  insert  (E,  (0, 6,7, 5) ) , 

which  we  can  obtain  by  setting  vl-5  and  v2=6.  If  q  is 

insert  (S,  (5, 1 , 6) )  ,  then  t (g)  is  undefined:  No  assignment  of  vl 

and  v2  can  make  the  variable  tuple  (v1,0,v2)  on  the  left-hand- 

/■ 

side  of  t  match  (5,1,6).  This  example  illustrates  the  tradeoff 
that  had  to  be  made  between  letting  t  be  defined  on  all  members 
of  RQ  (s2)  and  not  being  able  to  talk  about  the  correctness  of  t, 
and  between  letting  t  be  a  partial  function  but  with  better 
correctness  properties.  For  in  this  example,  t  could  never  be 
correct  for  operations  insert  (S, (x, y, z) )  where  y  was  not  *0'.  So 
in  order  to  specify  the  result  t(q)  of  applying  t  to  an  operation 
g  €  RQ(s2),  we  can  say  that  if  t  has  an  association  of  the  form: 

°P  (*■’ ,  w )  ->  op1(Rl,w1);  ...  ;opn(Rn,wn), 

then  t (g) =g*  if  there  is  some  valuation  x= ( xl , x2, - . « )  such  that 
op(R,w/x/)  is  q,  and  op  1  (R 1 ,  w  1/x/)  opn  (Rn ,  wn/x/)  is  g*,  where 


(I  V,  6)  155 


wi/x/  denotes  the  result  of  substituting  the  values  in  x  for  the 
variables  in  wi„ 

We  will  formalize  this  discussion  in  the  following  definition: 

Definition  6- 4-  Let  si,  s2  €  BSch-  An  operation  ma ppinq  t  6 
BMq  (s 1 , s2 )  is  a  finite  set  of  associations  of  the  form: 

op  (R,w)  ->  op  1  (R 1,  wl)  ;  . .  .  ;  opn  (En  ,wn)  , 

where  up, opl , opn  are  either  ’insert'  or  -delete',  and 
w,w1,...,vn  are  tuples  whose  components  are  either  variable 
symbols  v1,v2,v3,J...,  constant  symbols  0,1, 2, 3,-.,  or  blank 
If  q  €  RQ  (s2)  ,  then  t(g)  =  g*  if  and  only  if  there  exists  a 
valuation  x=  (xl , x2, x3  ,  i-e.  ,  a  sequence  of  constant  symbols, 

such  that  for  some  association  in  t  of  the  form: 

op  (R ,  w)  ->  op  1  tB1,  wl)  „  ;opn  (Fn  ,wn) 

g  is  op(R,w/x/)  and  q*  is  opl  (R 1 , wl/x/)  ; opn  (Rn, wn/x/) .  (This 

implies  that  for  each  R  €  s  and  variable  tuple  w,  insert (R,w)  and 
delete  (R,w)  appear  at  most  once  on  the  left-hand- side  of  an 
association  in  t.  )  The  instantiation  w/x/  is  defined  by: 

w/x/[Y]  =  if  w[Y]  is  a  constant  v  then  v 
elso  if  w[ Y ]  is  •— *  then 
else  if  w[Y]  is  a  variable  vj  then  xj. 

To  make  the  notation  more  precise,  VTup{n)  is  defined  to  be 
the  set  of  all  variable  n-t uples:  n- tuples  whose  components  are 
either  variables,  constants  or  blanks.  The  relation  <  is 
extended  to  variable  tuples  by  treating  variables  as  nonblank 
values.  Given  s  €  RSch,  the  set  Ts  (s)  is  defined  to  be  the  set 
of  all  translation  sequences  over  s,  that  is,  the  set  of  right- 
hand-sides  of  operation  mappings  of  the  form: 

op1(R1,w1)  opn  ( Rn  ,  wn )  , 

where  opi  is  ’irseif'  or  ’delete',  Ri  6  s  of  degree  k,  and  wi  € 
VTup (k) . 


□ 


(IV, 6)  156 


6.  2.  Properties  of  Operation  Mappings 

In  the  last  section  we  defined  the  data  model  operations  and 
the  mapping  model  operation  mappings  which  we  will  study.  In 
Chapter  2,  we  defined  a  number  of  properties  which  operation 
mappings  may  have  which  are  significant  to  database  management. 

In  this  section  we  will  translate  the  abstract  expression  of  some 
cf  those  properties  into  the  more  specific  context  of  our  models 
EEm  and  Rftm .  Before  doing  this  we  will  discuss  the  traditional, 
view  point  of  operation  mappings. 

In  the  literature,  the  problem  of  updating  views  is  usually 
approached  using  the  following  pr inciples[ ChGT  ]: 

uniqueness  rule:  An  insertion,  deletion  or  update  to  a  view  is 
permitted  only  if  there  is  a  unique  operation  which  can  be 
applied  to  the  underlyihg  base  relations  and  which  will 
result  in  exactly  the  specified  changes  to  the  user's 
view. 

rectan qle  rule:  An  insertion,  deletion  or  update  via  a  view 
must  affect  only  information  visible  within  the  rectangle 
of  the  view. 

The  inclusion  of  weak,  medium  and  dependent  semantics  has 
already  belied  these  rules.  We  add  a  few  more  counterexamples  to 
show  that  they  are  too  restrictive  in  the  present  context.. 

Consider  a  database  containing  the  relations 
Purchases_This_Year  and  Purchases_This_Month,  and  also  the  view 
Purchases  which  is  the  union  of  the  previous  two  relations.  An 
executive  working  within  an  annual  budget  would  prefer  to  query 
and  update  the  Purchases  view,  since  working  with  one  relation  is 
easier  than  working  with  two.  This  executive's  assistant  who 
must  mail  out  payments  for  each  month's  purchases  wants  to  work 
with  the  base  relations,  not  with  the  view.  Mathematically, 
there  are  three  ways  to  translate  the  executive's  insertions  on 
Purchases  to  insertions  on  the  base  relations:  insert  into 
Purch ases_This_Year ,  insert  into  Purchases_This_Month  or  insert 
into  both.  By  the  above  uniqueness  rule,  insertions  to  the 
Purchases  view  should  not  be  allowed.  Yet  when  one  understands 
the  semantics  of  the  relations,  it  is  clear  that  mapping 


(IV,  6)  157 


insertions  on  Purchases  to  insertions  on  Purchases_This_Month  is 
perfectly  valid  and  is  the  intended  mapping. 

For  another  example,  consider  the  following  relations: 

si :  Volunteer (V_Name,F  ro ject_t  ype) 

Sponsor (S_Name ,P_Name) 

Project  (P_Name , Pro ject_type) 

s2 :  Roster  (S_Name, V_Naae,P_Name, Pro ject_type) 
ra:  Roster=Sponsor*Vol un teer*Pro ject 

(***  indicates  a  join  on  common  domain.)  This  could  be  a  database 
for  a  volunteer  organization.  The  organization  has  two 
registration  terminals.  At  one,  volunteers  register  by  entering 
their  name  and  project  types  of  interest  into  the  Volunteer 
relation.  This  is  the  only  relation  needed  for  volunteer 
registration.  When  a  person  with  funds  wants  to  sponsor  a 
project,  he  registers  at  the  sponsor  desk.  The  view  here  is  the 
Roster  relation..  According  to  the  practise  of  the  organization, 
sponsors  always  work  on  their  projects.  To  register,  for 
example,  John  Doe  as  a  sponsor  of  the  project  Paint_Church  (of 
type  Painting) ,  the  tuple 

(John  Doe, John  Dot ,Paint_Church , Painting)  would  be  inserted  into 
Roster.  As  a  side-effect  of  this  insertion,  all  volunteers 
interested  in  painting  will  appear  in  Roster  and  John  Doe  may 
note  their  names  for  further  contact. 

In  this  example,  the  update  to  the  Roster  view  violates  the 
uniqueness  rule  because  side-effects  occur  in  the  view.  The 
side-effects  are  desirable,  however,  and  this  kind  of  -update 
should  not  be  disallowed. 

As  a  third  example,  suppose  we  have  a  university  database  with 
the  following  base  relation  and  view: 

St udent (Name,  Number,  Final_Standing) 

Graduate  (Name /  Number)  =  Student£ Final_Standin g  =  ’pass'] 

Students  given  conditional  degrees  sometimes  do  not  complete 
their  work  satisfactorily.  A  person  in  the  registrars  office 
then  deletes  them  from  the  Graduate  view.  This  update  is 
translated: 


(IV, 6)  158 


delete  Graduate  where  Name=x  — > 

update  Student- Final_Standing: =*  fail' 
woe  re  Student- Name  =  x 

The  domain  Final_Stan ding  is  not  explicitly  visible  in  the 
Graduate  view,  yet  the  update  has  a  valid  meaning-  This  means 
that  not  all  updates  violating  the  rectangle  rule  should  be 
disal lowed . 

It  is  also  easy  to  give  an  example  where  information  not  even 
implicitly  visible  in  the  view  needs  to  be  changed  by  updates  to 
the  view-  If  the  database  is  a  hierarchy  and  one  of  the  views 
(used  by  some  high  authority)  contains  only  the  root  segment, 
then  deletions  from  this  view  will,  and  must,  affect  other  parts 
of  the  database-  The  following  schemas  illustrate  this: 

si:  Employee 

Emp_Dependent 

s 2:  Employee 

If  someone  using  the  s2  view  deletes  an  Employee,  say,  because  of 
a  layoff,  then  the  Emp_Dependents  connected  to  this  Employee 
should  and  will  be  deleted,  even  though  they  are  not  visible  in 
s2- 

We  conclude  that  for  purposes  of  studying  general  mapping 
problems,  the  uniqueness  and  rectangle  rules  are  overly 
restrictive.  As  these  examples  have  also  shown,  certain  aspects 
of  mapping  problems  (for  example,  the  decision  of  which  part  of  a 
union  an  insertion  should  be  mapped  to)  require  additional 
semantics-  These  are,  however,  beyond  the  scope  of  this  thesis.. 
The  problems  which  do  not  require  more  semantics  than  the  models 
can  represent  include  the  ones  discussed  in  Chapter2:  whether  a 
mapping  is  type  1,  type  2,  type  3,  consistent  and  t-consisten t. 
Most  of  this  chapter' will  concern  itself  with  the  type  3 
property.  We  will  also  mention  problems  involved  with  the  other 
operation  mapping  properties. 

The  problem  we  are  studying  is  then  the  following: 


(IV, 6)  159 


Problem  7.  Let  si,  s2  €  ESch;  let  m  €  EMs(s1,s2)  ,  and  let  t  e 
EMq(s1,s2)o  Determine  whether  or  not  t  is  type  3  with  respect  to 

m. 


We  recall  from  Chapter  2  that  if  we  are  given  schemas  si,  s2  < 
ESch  and  a  state  mapping  m  €  RMs(s1,s2) ,  then  an  operation 
mapping  t  €  RMq(s1,s2)  is  type  3  with  respect  to  m  if  t  correctly 
interprets  s2-operations  whenever  the  before-  and  after¬ 
structures  of  si  are  states.  In  mathematical  notation,  this 
means  (Definition  2.6)  : 

if  st  e  RSt(sl)  ,  g  €  RQ(s2)  ,  t  is  defined  on  g 
and  t  (g)  (st)  6  RSt  (si)  , 
then  m  (t  (g)  (st)  )  C  q  (m  (st)  )  . 

We  want  to  translate  this  statement  into  more  specific  statements 
about  RDm  and  FMm. 

First,  since  an  operation  mapping  consists  of  a  finite  number 
cf  associations,  we  may  restrict  our  attention  to  those  mappings 
having  only  one  association.  For  if  t1,...,tn  are  the  distinct 
single-association  mappings  whose  union  is  t,  then  t  is  type  3  if 
and  only  if  each  ti  is  type  3.  When  t  is  a  single  association 
mapping,  we  can  write  it  as 

insert  (R,w)  ->  ts  or  delete  (R,w)  — >  ts, 

where  ts,  a  translation  seguence  (Definition  6.4),  is  an  element 
of  RQ  (si)  *.  Then  t(q)  can  be  written  as  ts/x/,  where  x  is  the 
valuation  which  makes  the  lef t- ha nd-side  of  t  mater'  q,  and 
t  (g)  (st)  can  be  written  ts(st;x).  The  above  commutative  diagram 
condition,  namely: 

m(ts(st;x))  C  q(m(st))  , 

can  be  interpreted  as  the  statement  that  the  conditions  which 
make  structures  admissible  images  of  q  (on  ra(st))  all  hold  on  the 
elements  in  m(ts(st;X)).  Four  possible  conditions  were  given  in 
Definition  6.3:  the  ones  for  independent,  strong,  medium  and 
weak  semantics.  We  therefore  can  break  down  the  above  general 
inclusion  condition  into  four  subconditions  (which  are  combined 
in  six  ways  according  to  the  six  possible  semantics  for  FQ  as 


(IV, 6)  160 


described  previously)  .  The  next  paragraphs  derive  these 
conditions. 

The  condition  on  before-  and  a fter- structures  for  independent 
semantics  is: 

if  t  is  op(R,w)->ts,  S#R  and  st*  is  a  possible  result 
of  ts(st;x)  (i„e.,  if  st*  €  is  (st;x)  )  , 
then  S(m(st*))  =  S  (m  (st)  )  . 

Now  suppose  the  assignment  S=e  is  in  the  state  mapping  m.  Then 
this  condition  can  be  expressed  as: 

if  t  is  op(P,w)->ts,  then  for  any  S*S  in  s2 
such  that  S=e  is  in  m,  and  for  any  st*  €  ts(st;x), 
e  (st  *)  =e  (st)  . 

The  condition  on  the  before-  and  after-structures  for  strong 
semantics  is: 

If  t  is  insert  (R,  w)  — >ts  (delete  (R,  w)  — >ts)  , 
if  x  is  any  valuation,  and  if  st  ’  6  ts(st;x), 
then  P(m(st'))  =  R  (m  (st)  )  +  {w/x/} 

(R  (m  (st ' )  )  =  R  (m  (st)  )-##w/x/) 

Again,  we  car.  simplify  this  expression  by  including  the 
assignment  R-e  of  m: 

If  t  is  insert  (R ,  w) ->ts  (delete  (R,  w)  — >ts)  , 
if  ::  is  any  valuation  and  if  st*  €  ts(st;x), 

then  e(st*)  =  e  (s  t)  +  {  w/x/}  (e(st')  =  e  (st)^#tw/x/). 

j 

For  medium  semantics,  we  similarly  derive  the  condition: 

If  t  is  inser t  (R}  w)  — >ts  (delete  (R,  w)  — >ts)  , 
if  x  is  any  valuation  and  if  st*  6  ts(st;x) , 

then  e  (st)  ♦  {w/x/}  C  e(st*)  (e  (st' )  C  e  (st )  —  ## w/x/)  » 

For  weak  semantics,  we  get  the  condition: 

( 

If  t  is  insert  (P,  w)  — >ts  (delete  (E ,  w)  — >ts)  , 
if  x  is  any  valuation  and  if  st*  6  ts(st;x), 
then  w/x/  €  e(st')  (##  w/x/  »e  (st 1 ) -<jt)  . 


(IV,  6 )  161 


The  above  discussion  provides  a  reduction  of  Problem  7  to  the 
following  set  of  problems: 

Problem  8.  Let  .*3  €  ESch;  let  e  6  Exp  (s)  of  degree  h;  let  w  € 
VTup(n),  and  let  ts  be  a  translation  sequence  over  F.p  (s) . 
Determine  whether  or  not  the  following  conditions  told: 

For  every  state  st  €  ESt(s)  and  every  valuation  x,  if  st1  € 
ts  (st  ;x)  «ESt  (s)  ,  then: 


(i) 

e(st*)  =  e(st) 

♦  {W/X/} 

(ii) 

e  (st  •)  =  e  (st) 

—  *# w/x/ 

(iii) 

e  (st)  ♦  {w/x/} 

C  e  (st* ) 

(iv) 

e(st')  C  e(st) 

-  ##w/x/ 

(v) 

w/x/  €  e(st') 

(vi) 

##w/x/  •  e(st') 

- 

(vii) 

e  (st  *)  =  e  ( st) 

0 

We  will  express  these  properties  as  seven  predicates  (with  the 
schema  an  implicit  parameter) : 

Definition  6-5-  Let  s  €  ESch;  let  e  €  Exp(s)  of  degree  n;  let  w 

be  a  variable  n-tuple,  and  let  ts  be  a  translation  sequence  over 

EQ  (s)  -  Then: 

(i)  Str— Ins (ts, e, w)  =  condition  (i)  above  holds 

(ii)  Str— Del  (ts,  e,  v)  =  condition  (ii)  above  holds 

(iii)  Med— Ins  (ts,  e,  w)  =  condition  (iii)  above  holds 

(iv)  Med— Del  (ts,  e,  w)  =  condition  (iv)  above  holds 

(v)  Wk— Ins  (ts,  e  ,  w)  =  condition  (v)  above  holds 

(vi)  Wk— Del  (ts,  e,  w)  =  condition  (vi)  holds 

(vii)  Tndep(ts,e)  =  condition  (vii)  holds 


D 

The  names,  of  course,  refer  to  the  source  of  the  definitions. 

We  record  some  elemerttary  properties  of  these  predicates  in  the 
following  theorem: 

Theorem  6.1.  Let  s  e  ESch,  e  €  Exp(s)  of  degree  n,  w  €  VTup(r) 
and  t  s  -<  Ts  (s)  . 

(i)  Str-Ins  (ts,e,  w)  =>  Med- Ins  (ts,  e,  w)  =>  Wk-Ins(  ts,e  ,  w)  , 


(TV,  6)  162 


(ii)  S tr-Del  (ts,e,  w)  =>  Med-Del  (ts,e,  w)  =>  Wk-Del  ( ts,e  ,  w)  . 


0 

It  was  previously  remarked  that  an  algorithm  to  derive  an 
operation  mapping,  given  the  schemas  and  state  mappings,  was  not 
meaningful  in  our  context.  However,  as  was  mentioned  then,  we 
can  give  a  guide  for  constructing  operation  mappings.  Now  that 
we  are  dealing  only  with  single  relational  algebra  expressions 
and  translation  sequences,  we  can  formulate  the  guide  in  a 
recursive  format.  The  input  is  an  expression  e,  a  variable  tuple 
w  and  an  empty  translation  seguence  ts.  The  output  is  a 
translation  seguence  ts  which  will  (possibly  —  see  below)  insert 
w  into  e  or  delete  w  from  e. 


Trans lation  Guide. 

[1]  To  insert  w  into  base  relation  B,  append  ' insert  (E , w ) *  to 


ts. 

*[ 2 ]  To  insert  w  into  e[X],  pick  some  w*  such  that  w'[X]  =  w  and 
insert  w*  into  e. 

[  3  ]  To  insert  w  into  e[Xf;V],  make  sure  that  v[  X  ]=V ,  then  insert 
w  into  e. 

[  4  ]  To  insert  w  into  e[X=Y],  make  sure  that  w[Xj=w[Y],  then 
insert  w  into  e. 


[5]  To  insert  w  into  e1&e2,  insert  pl(w)  into  el  and  p2(w)  into 
e2,  where  pi  projects  onto  el,  and  p2  projects  onto  e2. 


*[ 6  ]  To  insert  w  into  e1  +  e2,  insert  w  into  el  or  into  e2  or  into 
both  el  and  e2. 


[7]  To  insert  w  into  e1-e2,  insert  w  into  el  and  delete  w  from 
e  2 . 

[8]  To  delete  w  from  base  relation  B,  append  *  delete  (E, w ) •  to 
ts. 


[9]  To  delete  w  from  e[X],  delete  w*  from  e,  where  w,[X]=w,  and 
►he  other  coordinates  of  w*  are 


(IV,  6)  163 


[10]  To  delete  w  from  e[ X=V ]  or  e[X=Y],  delete  w  from  e,  or  do 
nothing  if  w  violates  the  EQs  X=V  or  X=Y,  respectively  (and 
therefore  cannot  be  present  in  any  case) . 

*[11]  To  delete  w  from  e1Cte2,  delete  pi  (w)  from  el,  or  delete 

p2(w)  from  e2,  or  delete  pl(v)  from  el  and  p2(w)  from  e2. 

[12]  To  delete  w  from  e1+e2,  delete  w  from  el  and  from  e2. 

*[13]  To  delete  w  from  el— e2,  delete  v  from  el,  or  insert  w  into 

e2,  or  delete  w  from  el  and  insert  w  into  e2. 

The  clauses  marked  with  an  asterisk  contain  choices,  and  the 
existence  of  several  choices  for  one  operation  is  the 
mathematical  reason  that  translations  must  be  based  on  some 
external  semantics,  as  we  noted  before. 

6.3  Formulation  and  Soundness  of 

Procedures  for  Weak  Type  3  Cper at ion  Mappings 

In  the  last  section  we  arrived  at  a  formulation  of  the  type  3 
property  for  operation  mappings  for  the  models  RDm  and  RMm.  In 
this  section  we  will  study  two  of  the  derived  predicates,  Wk-Ins 
and  Wk— Del,  which  relate  to  the  type  3  property  for  weak 
semantics,  and  we  will  formulate  decision  procedures  for  them. 

We  will  also  show  that  the  procedures  are  sound,  that  is,  that  if 
the  procedures  say  that  the  operation  mapping  has  the  type  3 
property  with  respect  to  weak  semantics,  then  it  really  does. 

At  the  end  of  the  last  section  we  gave  a  recursive  guide  for 
deriving  translation  seguences,  but  it  had  some  nonde terminis tic 
choices  for  certain  operations.  Even  if  there  were  no  choices  in 
the  translation  of  inserts  and  deletes,  a  recursive  procedure 
similar  to  the  one  above  would  not  be  able  to  ensure  that  the 
translation  sequence  output  would  satisfy  any  of  the  seven 
predicates  of  Definition  6.5.  To  see  this,  consider  the 
expression  R-E.  Obviously,  nothing  can  be  inserted  into  this 
expression,  but  a  recursive  procedure  which  blindly  generated  a 
translation  sequence  would  not  detect  this  fact.  For  a  less 
extreme  example,  consider  the  following  schema,  expression  and 
tuple: 


(IV,  6)  164 


s:  R(2),  S  (2) 
e:  (R£  2=1  ]S)  [4=2]R 

w:  (0, 1 ,1 ,2 ,2, 1) 

The  translation  sequence  generated  for  inserting  w  into  e  would 
be: 

insert (R ,  (0,1)  )  ;  insert  (S  ,  (1,2))  ;  insert  (R,  (2,1)). 

In  every  possible  initial  state,  the  final  state  would  have  at 
least  two  new  tuples:  (0 , 1, 1 ,2, 2, 1)  and  (2 ,1 , 1 ,2 ,2 , 1 ) .  Thus,  it 

would  ne/er  be  possible  for  the  predicate  Str— Ins  to  be 
satisfied;  yet  a  simple  recursive  procedure  for  generating 
translation  sequences  could  not  detect  this.  What  are  needed  are 
procedures  which  take  a  global  view  of  a  translation  sequence, 
with  no  regard  of  how  the  seguence  was  generated-  In  other 
words,  we  want  procedures  for  calculating  the  value  of  the  13- 
predicates  when  ».he  arguments  are  already  given,  and  we  will  not 
study  the  problem:  given  e  and  w,  generate  a  translation 
seguence  ts  such  that  Wk— Ins(ts,e,w)  or  Wk— Del  (ts,e, w)  is  true. 

We  have  reduced  and  simplified  the  problem  of  deciding  when  an 
operation  mapping  is  type  3  to  the  guestion  of  deriving  decision 
procedures  for  the  seven  predicates  defined  above.  The  way  we 
proceed  closely  resembles  what  was  done  in  the  case  of  structure 
mappings.  Consistency  for  structure  mappings  required  that 
certain  constraints  were  true  over  a  certain  infinite  class  of 
structures.  This  infinite  problem  was  shown  to  be  equivalent  to 
a  certain  finite  one,  namely,  constructing  the  sets  Drv(e). 
Determining  the  truth  value  of  Wk— Ins  and  Wk— Del  is  also  an 
infinite  problem  since  there  are  again  quantifications  over  the 
infinite  domains  of  states  and  valuations.  The  finite  problem 
which  we  want  to  show  equivalent  to  the  infinite  one  constructs 
sets  of  tuples  which  have  certain  properties.  We  will  informally 
discuss  these  sets  for  each  relational  operator  and  then  give  the 
formal  definitions.  Note  that  in  informal  discussions,  the 
variable  valuations  will,  for  simplicity,  often  not  be  mentioned- 
We  also  assume  that  the  semantics  of  EQ  are  independent. 

For  Wk— Ins  (ts ,e ,w)  to  be  true,  ts/x/  must  always  result  in 
v/x/  being  an  element  of  e  for  each  valuation  x. 


(IV, 6)  165 


Suppose  e  is  a  base  relation  R.  If  RQ  has  weak  semantics, 
then  there  can  be  at  most  one  tuple  *  always  inserted  into  Rr  and 
this  is  when  insert  (h,w)  is  the  lasr  operation  in  ts  involving  R. 
This  is  because  any  other  operation  on  S  might  cause  w  to  be 
deleted-  If  EQ  has  medium  semantics,  then  the  tuples  always 
inserted  into  R  will  be  w1,....,wn,  where 

insert  (R, wl) ,..., insert (R ,wn)  are  the  last  operations  in  ts 
involving  R-  This  is  because  any  deletion  operation  could  cause 
cne  of  the  wi  to  be  deleted-  If  EQ  has  sxrong  semantics,  then  v 
will  always  be  ircorted  into  E  if  an  operation  insert  (R,w) 
appears  in  ts  such  that  no  delete  (R,w')  occurs  after  this 
occurrence,  where  w*>w. 

When  the  expression  is  a  projection  e[X],  ts  will  cause  w  to 
be  an  element  of  e[ X ]  as  long  as  ts  causes  some  w*  to  be  a  member 
of  e,  where  w*£X]=w.  In  other  words,  to  insert  a  tuple  into  a 
projection,  insert  some  extension  of  it  into  the  unprojected 
relation. 

If  the  expression  is  a  selection  e£  X^V  1,  ts  will  insert  w  into 
the  expression  if  ts  inserts  w  into  e  and  if  w[X]=V.  A  similar 
rule  holds  for  restriction. 

If  the  expression  is  a  cross  product  elfi)e2,  ts  will  insert  v 
into  e  if  it  inserts  pi  (w)  into  el  and  p2(w)  into  e2. 

For  a  union  e«+e2,  ts  must  insert  v  into  el  or  into  e2. 

Now  consider  a  set  difference,  sa>  cl— e2.  First,  ts  most  be 
such  that  w  is  always  made  an  element  of  el.  Second,  w  must 
never  be  allowed  to  appear  in  e2.  This  condition  will  hold  if  ts 
causes  w  always  to  be  deleted  from  e2,  or  if  w  can  never  appear 
in  e2  because  it  violates  some  (EQ)  constraint.. 

Following  the  above  discussion,  we  will  define  the  set 
Ins(ts,e)  which  captures  all  tuples  w  which  ts  causes  to  be 
inse  ted  into  e-  To  be  more  precise,  the  set  Ins(ts,e)  will 
contain  all  variable  tuples  w  such  that  if  x  is  a  valuation,  then 
ts/x/  will  insert  w/x/  into  e. 

As  noted  above,  the  formula  for  set  difference  needs  to  ba 
able  to  calculate  when  a  tuple  can  never  appear  in  an  expression 
because  of  inherent  constraint  violations.  For  example,  if  a  is 
F[1=*3*],  there  is  a  valid  constraint  ljf'3*  on  e.  Any  tuple 


(IV, 6)  166 


which  does  not  have  a  *3'  in  column  1  will  violate  this 
constraint  and  can  never  be  inserted  into  e. 

The  next  definition  specifies  a  set-funci.ion  Cv  which  given  an 
expression  e,  gives  all  the  tuples  which  violate  EQs  on  e.  (The 
set  is  infinite  Dut  recursive,}  We  then  define  the  Ins  operator. 

Definition  6,6.  Let  s  €  ESch,  and  let  e  €  Exp(s)  of  degree  n.  A 
tuple  w  e  VTup(n)  is  a  member  of  the  set  Cv(e)  of  constraint 
violators  under  the  following  conditions: 

Let  E1,...,Eq  be  t he  equivalence  classes  of  domains  under  •=• 
in  e.  Define  value  sets  V1,...,Vq  as  follows:  For  each  X  €  Ei, 

(i)  if  w[X]=v,  where  v**— '  and  v  is  not  a  variable, 
then  v  €  Vi; 

(ii)  if  X=V  €  Drv(e),  then  V  €  Vi,  and 

(iii)  if  w[X]=,vj*  (a  variable),  and  for  some  Y  €  Ek,  w[Y]='vj', 
then  Vk  C  Vi. 

If  any  one  of  VI,..., Vq  has  more  than  one  element,  then  w  e 
Cv  (e)  ;  otherwise  w  $  Cv(e). 

0 

Theorem  6.2.  Let  s  €  ESch,  and  e  €  Exp(s)  of  degree  n.  Then 

Cv  (e)  =  {w  e  7Tup(n)  :  some  EQ  of  Drv(e)  is  false  in  {w'J 

for  every  w’>w/z/  V  valuations  z} . 

Proof-  If  w  €  Cv  (e)  ,  then  some  value  set,  say  VI,  has  at  least 
two  members.  If  these  two  values  were  assigned  by  clauses  (i) 
and/or  (ii) ,  then  there  will  be  an  EQ  violation  simply  among  the 
constant  values  of  w.  If  the  two  values  were  assigned  because  of 
clause  (iii),  then  there  are  domains  X  and  Y  in  different 
equivalence  classes  such  that  w[X  ]=vj  =  w[  Y]  and  domains  X*  and  Y* 
such  that  w[X*  ]=a,  w[Y*  ]=b,  a#b  and  X=X'  and  Y=Y*  are  in  Drv(e) . 
Then  nc  matter  what  value  is  given  to  variable  vj  in  a  valuation 
z,  either  X=X'  or  Y=Y*  will  be  violated  in  w/z/. 

Finally,  replacing  blanks  by  values  will  not  remove  these 
violations. 

The  other  direction  follows,  similarly. 


□ 


(IV, 6)  167 


Definition  6.7.  Let  s  6  ESch;  let  e  €  Exp(s)  of  degree  n,  and 
assume  the  semantics  of  3Q  are  independent. 

Let  ts  €  Ts(s).  The  set  Ins(ts,e)  (with  s  implicit  and  often 
just  written  Ins(e))  is  defined  by  induction  on  e  as  follows: 

£1]  Ins  (E)  =  if  EQ  has  weak  semantics 

then  {w}  if  ts  =  ts*  ;inser  t  (E,  w)  ;ts"  and 
E  does  not  appear  La  ts" 
if  EQ  has  medium  semantics 

then  {w  :  insert  (E,w)  €  ts  and  no  delete  (E,w*) 
occurs  to  the  right  of  this  insert} 
if  EQ  has  strong  semantics 

then  {w  :  insert  (E,wj  €  ts,  and  no  delete(E,w') 
appears  to  the  right  of  insert  (E,w), 
where  w'<w} 

[2]  Ins  (e[  X  ])  =  Ins(e)  [X] 

£3]  Ins  (e[  X=  Y  ])  =  Ins(e)£X=Y] 

£4]  Ins  (e£  X5V  ])  =  Ins(e)£X5V] 

£5]  Ins(e1fie2)  =  Ins  (e  1)  Sins  (e2) 

£6]  Ins(e1+e2)  =  Ins  (e  1)  +lns  (e2) 

£7]  Ins  (el— e2)  =  Ins  (cl ) •  (##Del  (e2)  +  Cv (e2) ) 

(Del  (e)  will  be  defined  shortly.) 

In  these  definitions  the  relational  algebra  operators  on  the 
left  are  parts  of  formal  expressions;  the  ones  on  the  right  are 
"metalanguage"  operators  which  specify  what  to  do  with  the  sets 
of  variable  tuples.  Eestriction  and  selection  on  variable  tuples 
treat  variables  as  another  kind  of  constant.  Thus  v1=v1  is  true, 
•but  v1=v3  and  v1  =  2  are  false. 

0 


Now  let  us  consider  deletion.  For  Wk-Dei  (ts ,e  ,w)  to  be  true, 
ts/x/  must  always  cause  the  set  ##w/x/  to  be  deleted  from  e,  for 
each  valuation  x.  We  will  informally  discuss  this  condition  for 
base  relations  and  for  each  relational  algebra  operator. 

Assume  e  is  a  base  relation  E.  If  EQ  has  weak  semantics,  then 
if  ts  contains  delete (E,w)  and  after  this  operation  only 


(IV, 6)  168 


operations  not  involving  E  occur,  then  the  set  ##w/x/  will  always 
he  deleted  from  E.  If  EQ  has  medium  semantics  and  if  ts  can  be 
written  ts1;ts2,  where  in  ts2  the  only  operations  on  E  are 
delete  (R,  wl)  ,...  ,delete  (R,wn)  ,  then  the  sets  ##w1 /x/, „ . . , ##wn/x/ 
will  always  be  deleted  from  F.  If  EQ  has  strong  semantics,  then 
the  set  ##w/x/  will  always  be  deleted  from  R  if  delete  (R,w)  is  in 
ts  and  no  operation  insert (E,w*)  appears  after  it,  where  w*>w. 

An  example  will  illustrate  what  we  mean: 

s:  R  ( 4) 

ts:  delete  (R,  (v1,-,2,-) ) ;insert(R,  (vl, 0, 2,3) )  ; 
delete  (E,  (v1,-,-,4)) 

After  the  first  operation,  we  should  calculate  Del(e)  = 

#  (v1,-,2,-) ,  After  the  second  operation,  Del  (e)  would  be 

[w  €  V T u p  ( 4 )  :  w[1]=v1  8  w[ 3  ]=2  8  w[2]#0  8  w[4]*3}. 

After  the  third  operation,  Del(e)  would  be 

{w  €  Tup(4)  :  (w[1]=v1  8  w[3]=2  8  w[2]#0  8  w[4]#3)  or 

(w[  1  ]—  v  1  8  w[  4]  =  4)  } 

For  a  projection  e[X],  w  will  be  deleted  by  ts  if  ts  deletes 
every  extension  of  w  from  e.  This  means  that  ts  must  delete  w* 
from  e,  where  w'[X]  =  w  and  w*[X_]  =  'f'  (X~  are  the  domains  of 

e  not  in  X.)  This  is  because  if  w"  is  any  extension  of  w  to  all 
domains  of  e,  then  w">w'. 

To  delete  w  from  a  restriction  or  selection,  ts  must  delete  w 
from  the  unrestricted,  respectively,  unselected  relation. 

In  a  cross  product  e18e2,  ts  will  delete  w  if  it  deletes  pl(w) 
from  el  or  r2(w)  from  e2  (or  if  it  deletes  both).  The  other  half 
of  w  can  be  anything,  and  so  the  tuple  whose  other  half  is  all 
blanks  will  be  in  Del(el8e2). 

To  delete  w  from  a  set  difference  el— e2,  we  may  delete  w  from 
el  or  insert  it  into  e2. 

When  tuples  contain  blanks,  there  are  interactions  possible 
between  the  blanks  and  the  constraints  valid  in  the  expression. 
Consider  the  following  example: 


(IV, 6)  169 


si  2  R(3) 
s2 :  S  (3) 
m  :  S=R[ 2=3  ] 

tl :  delete (S, (vl ,v2,— ) )— >delete  (E,  (vl, v2,v2) ) 
t2:  delete  (S, (vl, v2 ,—) )—>delete (R,  (vl,— , v2) J 

Both  \.1  and  t2  are  correct  mappings  since  every  tuple  which  can 
appear  in  S  will  have  ecual  values  in  domains  2  and  3.  In  other 
words,  the  tuples  (v1,v2,— ),  (v1,v2,v2)  and  (v1,-,v2)  all  specify 

the  same  set  to  be  deleted  from  S,  namely  {  ( vl  ,  v2  ,  v2)  }  . 

Thus  in  addition  to  generating  deletion  sets  as  discussed 
above,  we  must  "close"  these  sets  with  respect  to  the  movement  of 
blanks  due  to  EQs*.  We  formally  express  this  as  an  operator  Mb 
(Move  blanks).  The  operator  takes  two  arguments,  an  expression 
and  a  set  of  tuples.  The  result  is  a  set  of  tuples  derived  from 
the  argument  set  where  blanks  have  been  moved  according  to  the 
constraints  derivable  on  t:he  expression. 

Definition  6.8.  let  s  €  RSch;  let  T  C  VTup(n),  and  let  e  € 

Exp(s)  of  degree  n-.  The  set  Mb(e,T)  is  defined  by  the  rules: 

[1]  If  w  €  T— Cv  (e)  and  w[X]#'— *  ,  then  for  each  domain  Y  in  the 
equivalence  class  X/=  (in  Drv(e)),  every  tuple  w'  such  that 
w*[Y]=w[X],  w  *  [  Y  *  ]=  (•— *  or  w[  X  ])  for  every  other  Y*  €  X/=,  and 
w*[Z]=w[Z]  for  Z  £  X/— ,  is  in  Mb(e,T). 

[2]  If  w  6  T— Cv  (e)  and  X=V  €  Drv  (e)  ,  then  every  tuple  w*  such 
that  w  '  [  Y  ]=  (•  —  *  or  V)  for  every  Y  €  X/=  and  w*[Z]=w[Z]  for 
every  Z  $  X/=,  is  in  Mb(e,T). 

D 

Using  this  operator,  and  following  the  previous  discussion,  we 
define  the  set  Del(ts,e)  which  captures  all  of  the  tuples  which 
ts  always  causes  to  be  deleted  from  e.  The  sets  Del(ts,e)  are 
not  finite,  but  it  can  be  seen  that  they  are  recursive  in  the 
sense  of  there  being  a  decision  procedure  for  determining  their 
members. 

Definition  G-3.  Let  s  6  RSch  ;  let  ts  be  a  translation  seguence, 
and  let  e  be  an  expression  over  s.  The  set  Del  (ts,e)  (often 
written  Del(e))  is  defined  by  induction  on  e  by 


(TV, 6)  170 


[  1]  Del  (R)  =  Mb(R,T)  , 

where  T  is  defined  as  follows, 
if  RQ  has  weak  semantics 

then  T  =  #w,  if  ts=ts'  jdelete  (E,  w)  ;ts"  and 
R  does  not  appear  in  tsn 
if  RQ  has  medium  semantics 

then  T  =  #  (w  :  delete  (R,w)  €  ts  and 

no  insert  (R , w ' )  occurs  to  the 
right  of  this  delete} 
if  RQ  has  strong  semantics 
then  T  = 

#  {w  :  delete  (R,w)  €  ts}  — 

{w  :  w  <  w*  for  some  w*  6  Ins(R)} 


L  2] 

Del  (e[  X  ])  = 

Del  (e)[X- 

3 £x  j 

[3] 

Del  (e[  X=Y  ]) 

=  Mb 

(e[  X= 

Y], 

Del  (e) ) 

[4] 

Del  (e[  X5V  ]) 

=  Mb 

<e[  X= 

;V], 

Del (e) ) 

[5] 

Del  (e  1Qe2) 

=  Del 

(el)  a#  (- 

- , .  0  .  ,— ) 

[6] 

Del  (e1+e2) 

=  Del 

(el)  « 

Del  (e2) 

[7] 

Del (el— e2) 

=  Del 

(el) +Ins 

;(e2) 

+  #(-,...  ,-)  fiDel  (e2 ) 


□ 


We  have  now  presented  algorithms  for  generating  two  recursive 
sets  of  tuples  from  a  given  translation  sequence  (with  respect  to 
a  given  expression) .  We  want  to  show  that  truth  of  the 
statements  Wk— Ins  (ts,  e,  w)  and  Wk-Del  ( ts,e ,  w)  are  equivalent, 
respectively,  to  membership  of  w  in  Ins  (ts,e)  and  in 
Del (ts,e) +Cv (e) .  This  problem,  like  the  constraint  problem,  has 
soundness  and  completeness  parts.  Soundness  means  that  only 
tuples  satisfying  the  predicates  have  membership  in  Ins  or  Del. 
Completeness  means  that  every  tuple  satisfying  the  predicates  has 
the  desired  membership.  Soundness  is  proven  first.  Note  that 
because  Ins  and  Del  are  Jinked  by  the  set  difference  operator, 
the  proof  is  technically  a  simultaneous  induction  on  e.  However, 
we  will  textually  separate  the  two  parts  for  easier  reading. 

The  soundness  proof  for  Pel  requires  a  preliminary  theorem. 

We  need  to  show  that  the  Mb  operator  preserves  soundness.  That 
is,  if  the  original  set  T  has  the  property  that  all  its  members 


(IV, 6)  171 


satisfy  Wk— Del,  then  applying  Mb  results  in  a  set  whose  members 
also  satisfy  Wk— Del.  We  prove  this  and  then  the  soundness 
theorem. 

Theorem  6.3.  Let  s  6  BSch,  e  €  Exp(s)  of  degree  n,  T  C  VTup(n)  , 
and  ts  €  Ts  (s) .  Then  the  condition 

w  €  T  =>  Wk— Del  (ts,e  ,w) 

implies  the  condition 

w  6  Mb(e,T)  =>  Wk-Del  (ts,e  , w) . 

Proof.  Suppose  w  €  T-Cv(e)  and  w[X  Let  st  be  a  state  and 

z  a  valuation..  Let  Y  €  X/=,  and  let  w*  be  as  in  clause  [1].  If 
wM  €  ##w*/z/«e  (ts  (st ; 2)  )  ,  then  wj;  €  ##w/z/«e  (ts  (st;  z)  ) . 

If  w  6  T-Cv  (e)  ,  X=V  €  Drv(e),  and  w*  is  as  in  clause  [2],  then 

for  any  state  st  and  valuation  z,  if  wH  €  ##w*/z/*e (ts (st ;z) )  , 

then  v"  e  ##  w/z/*e  (ts  (st ;  z)  )  . 

□ 

We  are  now  ready  to  prove  the  soundness  theorem. 

Theorem  6«4.  Let  s  6  ESch,  e  €  Exp  (s)  of  degree  n,  ts  €  Ts(s) 
and  w  e  VTup(n) .  Then  as  long  as  the  semantics  are  independent; 

(i)  if  w  €  Ins(ts,e),  then  Wk-Ins  (ts,e ,  w)  ,  and 

(ii)  if  w  €  Del(ts,w),  then  Wk-Del  (ts,e,  w) .. 

Proof.  Part  (i)  is  proved  first. 

Suppose  w  €  Ins(E)  ,  and  assume  the  semantics  are  strong.  Then 
ts  can  be  written  tsl ; insert (E, w>  ;ts2,  where  ts2  contains  no 
operation  delete (E,w*)  with  w*<w.  It  follows  that  for  every 
state  st  and  valuation  x,  w/x/  €  e  (tsl/x/ ; insert (B, w/x/)  (st))  , 
and  therefore,  w/x/  6  e(ts(st;x) ) . 

The  cases  for  weak  and  medium  semantics  are  similar.. 

If  w  €  Ins(e[X])  =  Ins(e)[X],  then  for  some  w*  <  Ins(e), 
w=w*[ X  ].  Since  Wk-Ins  (ts ,e ,w * )  is  true  by  induction,  given  any 
state  st  and  valuation  z,  w*/z/  €  e(ts(st;z)),  and  this  means 
w»[X]/z/  €  e[  X  1  (ts  (st  ;z) )  ,  i.e„,  Wk-Ins  (ts  ,e[  X  ],  w) . 


(IV, 6)  172 


If  w  6  Ins(e[X=Y])  =  Ins(e)£X=Y],  then  w  6  Ins(e),  and  so 
Wk-Ins  (ts,e,w)  .  Since  v[X]=v[Y],  Wk-Ins  (ts,  e[X=Y  ], w)  . 

If  w  6  Ins(e[X=V])  =  Ins(e)[X=V],  then  w  €  Ins(e),  and  so 
Wk-Ins  (ts,e,w)  .  Since  w[X]=V,  Wk-Ins  (ts,  e£ X=V  ],  w)  . 

If  w  €  Ins  (e1fie2)  =  Ins  (el ) filns  (e2)  ,  then  pl(w)  €  Ins(el)  and 

p2  (w)  €  I  ns(e2),  where  pi  projects  on  the  first  deg  (el) 

coordinates  and  p2  projects  on  the  deg(e1)  +  1  through 
deg  (e 1) +deg (e2)  coordinates.  By  induction,  Wk— Ins(ts,e1, pi (w) ) 
and  Wk— Ins  (ts,  e2,  p2  (w)  )  ,  and  it  follows  that  Wk— Ins  (ts,e1  fie2,  w)  . 

If  w  €  Ins(e1  +  e2)  =  Ins  (e  1 )  tins  (e2)  ,  then  w  €  Ins(el)  or  w  < 

Ins(e2).  Hence,  Wk-Ins  (ts,e1 ,w)  or  Wk— Ins  (ts, e2, w) .  In  either 
case,  Wk— Ins  (ts, e 1  +  e2 , w) . 

If  w  €  Ins(e1-e2)  =  Ins  (el)  •  (Del  (e2)  +Cv  (e2)  )  ,  then  w  €  Ins  (el) 
and  w  e  Del  (e2)  tC  v  (e2)  ,  If  w  €  Del  (e2)  ,  then  by  induction, 

Wk-Del  (ts  ,e2,  w)  ,  ana  if  w  €  Cv(e2),  then  Wk— Del  (t  s,e2  ,  w)  is 
trivially  true.  We  also  have  by  induction  Wk— Ins  (ts,e1 , w ) ,  hence 
Wk-Ins  (ts, el— e2,w)  . 

Now  we  prove  part  (ii) . 

Suppose  w  €  Del(R),  and  assume  the  semantics  are  strong.  Then 
ts  can  be  written  ts 1 ; delete (F , w* ); ts2,  where  w*<w  and  in  ts2 
there  is  no  insert (R,wn)  with  w">w.  If  follows  that  for  every 
state  st  and  valuation  x  that  ##w/x/«R  { ts  1/x/ jdelete  (R ,  w ' /x/)  )  C 
##w*/x/«R  (tsl/x/; delete (R ,w*/x/) )  -  0,  and  therefore 
##w/x/«R  (ts  (st ;  x)  )  =  0. 

The  cases  for  medium  and  strong  semantics  are  simi lar. 

Suppose  w  €  Del(e[X])  *'  Del  (e)  [  X”=  •  —  *  ][  X  ].  By  the  induction 
hypothesis,  W  k—  Del  (ts ,  e,  w  • )  is  true  for  every  w*  €  Del(e).  In 
particular,  the  w*  such  that  w'[X]  =  w  and  w'[X”]  =  '  has  this 

property.  This  implies  that  for  any  state  st  and  valuation  y, 

##w/y/»e[  X  ]  (ts  (st ;  y) )  =  0t  since  a  non-empty  intersection  would 
mean  that  ##w  */Y/*e  (ts  (st ;  y)  )  *  0-.  Hence,  Wk— Del  (ts  ,e[  X  ]  ,  w)  is 

true. 

If  w  €  Del  (e)  then  Wk-Del  (ts , e , w )  by  the  induction  hypothesis, 
so  Wk— Del  (ts,e[  X=Y  ],w)  and  Wk—  Del  (ts,  etXfif  ],w)  since  e[X=Y]  (st)  C 

e(st)  and  e[X  =  V](st)  C  e(st)  for  all  st.  By  Theorem  6.3,  w  € 


(IV, 6)  173 


Del(e[X=Y])  and  w  €  Del(e[X^V])  implies,  respectively, 

Wk-Del  (ts,e[X=Y],w)  and  Wk-Del  (ts  ,e[X=V  ],  w)  . 

If  w  €  Del  (el)  fi#  ,  then  pi  (w)  6  Del(el)  and  so 

Wk— Del (ts,e* , pi  (w) ) .  It  follows  that  Wk— Del (ts,e 1®e2, w)  since 
for  any  state  st  and  valuation  x,  if  w*  6 

##w/x/« (e1®e2)  (ts  (st; x) ) ,  then  pl(w')  6  # #p 1 (w) /x/«e 1 (ts (st ; x) ) 
which  is  empty.  Similarly,  w  €  #(-,...,-)  fiDel  (e2)  implies 

Wk— Del  (ts  ,e1fie2  ,  w) . 

For  the  case  of  union,  suppose  w  6  Del  (el)  «Del  (e2)  ,  i. e. ,  w  e 
Del(el)  and  w  €  Del(e2).  By  the  induction  hypothesis,  for  any 
state  st  and  valuation  x,  ##w/x/*e 1  (ts (st ; x) )  =  0  and 
##w/x/«e2  (ts (st ; x) )  =  0;  from  which  it  follows  that 
##w/x/*  (e  1+e2)  (ts(st;x>)  = 

If  w  6  Del  (e2)  or  w  €  Ins(e2),  then  either  Wk-Del  ( ts,e  1  ,w)  and 
Wk— Del  (ts ,  e  1— e2,  w)  since  el— e2  C  el,  or  Wk— Ins  ( ts  ,e2  ,  w)  ,  which 
also  means  Wk— Del (ts, e 1— e 2, w)  . 

0 


6.4.  Complete ness  of  Procedures 

We  have  now  shown  that  -Ins  and  Del  do  not  make  any  incorrect 
calculations.  Next,  we  want  to  show  that  their  calculations  are 
complete. 

The  completeness  property  states  that  if  Wk— Ins( ts  ,e , w)  is 
true,  then  w  €  Ins(ts,e)  ;  i„e„,  it  says  that  Ins  does  not  "miss” 
any  tuples  which  are  always  inserted  into  e.  For  deletion  the 
•property  is  that  if  Wk— Del  (ts,e, w)  is  true,  then  either  w  is  in 
Del(ts,e)  or  else  v  e  Cv(e).  The  second  condition  is  necessary 
because  ##w  may  never  intersect  any  state  of  e  because  of 
internal  (EQ)  constraint  violations.  In  this  case, 

Wk— Del (ts,e, w)  is  trivially  true.  Tue  formal  statement  of  the 
property  is  the  following: 

Proposition.  Let  s  tf  RSch,  let  e  c  Exp(s)  of  degree  n,  ts  € 

Ts  (s)  and  v  €  VTup  (n)  .  Then 

(i)  if  Wk-Ins  (ts,e,  w)  ,  then  w  6  Ins  (ts,  e)  ,  and 

(ii)  if  Wk-Del  (ts,e ,  w)  ,  then  v  £  Del  (ts,  e) +Cv  (e)  . 


(IV, 6)  174 


0 

A  natural  approach  to  proving  completeness  is  to  use  induction 
on  the  length  of  e.  In  such  a  proof,  we  would  want  to  show  for 

the  projection  operator  that  Wk— Ins  (ts,  e[  X  ],w)  implies  w  € 

Ins(ts,e[X])  =  Ins  (ts ,e)  [  X  ]„  The  hypothesis  says  that  for  all 
states  st  and  valuations  y,  w/y/  €  e[  X]  (ts (st ; y) )  ,  i.e.,  that  for 
every  state  Ft  and  valuation  y,  there  is  a  tuple  w1  (depending  on 
st)  such  that  w'[  X]/y/=w/y/  and  w*  €  e(ts(st;y))-  In  order  to 
apply  the  induction  hypothesis,  that  Wk— Ins (ts, e, w* )  implies  #'  € 
Ins(e),  we  need  to  show  that  at  least  one  w'  is  independent  of  st 

and  y,  i.e.,  that  w'/y/  €  e(ts(st;y))  for  all  st  and  y.  We  can 

write  down  Predicate  Calculus  formulas  which  make  this  problem 
mere  precise: 

What  we  have  is 

(¥y)  (Vst)  Hw*)  (w '[  X]/y/=w/y/  •  w'/y/  6  e(ts(st;y))) 
and  what  we  want  is 

(¥y)  tfw')  (w'[X]/y/=w/y/  •  (Vst)  (w'/y/  €  e  ( ts  (st ;  y) )  )  )  . 

Unfortunately,  this  property  does  not  always  hold.  Consider  the 
following  example: 

si:  R(4),  S  (4)  ,  S:1->2,3,4 
e:  (R— S)  [3,4] 

ts:  insert  (R,  (0 , 1 , 2 , 3)  )  ; insert  (R,  (4,  1,2,3)  ) 

Ins  (R)  =  {(0,1, 2, 3),  (4, 1,2, 3)} 

Del(S)  =  0 
Cv(S)  =  0 

Ins(R-S)  =  Ins  (R)  •  (Del  (S)  +Cv  (S) )  = 

Ins  (e)  =  Ins(R-S)  [  3,4]  =  0[3,4]  =  0 

Here  we  have  (2,3)  ^  Ins(e)„  It  can  be  seen,  however,  that  (2,3) 
is  always  inserted  into  e.  For  let  st  be  any  state.  If 
(0,1, 2,3)  $  S  (st)  ,  then  (0,1, 2, 3)  $  S(ts(st)),  and  so  we  will 
have  (0,1, 2, 3)  €  (R-S)  (ts  (st)  )  ,  giving  (2,3)  6  e(ts(st)).  On  the 
other  hand,  if  (0,1, 2, 3)  €  S(st),  we  must  have  (4, 1,2, 3)  $  S(st) 
since  domain  1  is  a  key  of  S.  Hence,  (4,  1,2, 3)  $  S(ts(st)),  and 
so  (4, 1,2, 3)  €  (R—  S)  (ts  (st)  )  ,  also  giving  (2,3)  €  e(ts(st))  . 

Thus,  although  no  single  tuple  is  always  inserted  into  the 


(IV, 6)  175 

unprojected  relation  B-S,  the  tuple  (2,3)  is  still  always 
inserted  into  (B— S)  [3,4]. 

A  similar  problem  occurs  with  union.  We  would  like  to  prove 
that  Wk-Ins  (ts,e1+e2,  w)  implies  w  €  Ins(e1*e2)  =  Ins(el) +Ins(e2)  . 

As  a  predicate  calcul-us  statement,  the  hypothesis  reads 

(Vx )  (Vs  t)  (v/x/  €  el  (ts  (st;  x) )  «■  w/x/  €  e2  (ts  (st ;  x) )  )  , 

and  the  statement  we  need  to  derive  to  invoke  the  induction 
hypothesis  is 

(-Vx)  (Vst)  (w/x/  h  el  (ts  (st ;  x) )  )  ♦  (Vx)  (Vst)  (w/x/  6  e2  ( ts  (st ;  x)  )  )  . 

However,  universal  quantifiers  generally  do  not  distribute  over 
disjunction.  We  can  give  an  example  where  this  is  the  case: 

si:  B  (2)  ,  S  (2) 
e:  (B — S)  +  (B»S ) 
ts :  insert  (B, (0, 0) ) 

Ins(E)  =  {(0,0)} 

Ins  (S)  =  Del  (S )  = 

Ins  (B-S)  =  {(O,O)}.0  = 

Ins(B*S)  =  {(0,0)}  (this  is  a  derived  rule) 

Ins  (e)  =  0+(S  =  0 

Thus  (0,0)  $  Ins(e).  However,  (0,0)  is  always  inserted  into  e. 

For,  given  any  state  st ,  if  (0,0)  €  S(ts(st)),  then  (0,0)  € 

(B»S)  (ts  (st) )  ,  and  if  (0,0)  $  S(ts(st)),  then  (0,0)  € 

(B*S)  (ts  (st) ) .  In  either  case,  (0,0)  6  e  (ts  (st) )  .  This  fact  can 
also  be  seen  from  the  equivalence  B=  (B— S) ♦  (B*S) .  In  this 
example,  (0,0)  is  always  inserted  into  a  union  even  though  it  is 
not  always  inserted  into  either  of  its  components. 

The  situation  is  similar  to  the  one  of  Chapter  4  where  the 
formula  for  calculating  constraints  on  set  differences  needed  to 
know  about  equivalent  expressions.  In  order  to  properly 
calculate  Del  sets,  the  algorithm  must  be  able  to  determine  when 
expressions  are  equivalent.  As  we  saw  in  Chapter  4,  this  led  to 
the  undecidability  result  for  calculating  constraints.  We  have 
similar  results  here: 


(I  V,  6)  176 


Theorem  6.5.  Let  s  e  RSch;  let  R  (n)  €  s,  and  let  el,  e2  €  Exp(s) 
cf  degree  n  such  that  R  does  not  appear  in  el  or  e2.  Then  e1  =  e2 
if  and  only  if  Wk— Del  (ts, E, w)  ,  where  ts  is  delete  (R,w);  E  is 
R-*  (el— e2)  ♦  (e2— el)  ,  and  w  is  (—,...,—)  €  BTup(n). 

Proof.  If  e1=e2,  then  for  every  state  st ,  E(st)=R(st).  Since 
R (ts (St) ) =0  for  every  st,  E(ts(st))=£.  In  other  words 
##w«E (ts(st) ) =0,  and  so  W k— Del (ts ,E ,w)  is  true. 

Now  suppose  el£e2.  This  means  there  is  some  state  st  and  n- 
tuple  w*  such  that  either  w*  €  el  (st)  and  w'  f  e2  (st)  ,  or  w*  € 
e2(st)  and  w*  £  el(st).  In  either  case,  w*  € 

( (el— e2)  +  (e2— e  1) )  (st)  .  Since  R  does  not  appear  in  either  el  or 
e2,  we  also  have  w*  e  ( (e  1— e2)  +  (e 2— el ) )  (ts(st))..  Hence,  w*  e 
E  (ts  (st| )  ,  and  also  w*  €  ##w«E  (ts  (st)  )  .  Therefore, 

Wk-Del  (ts , E, w)  is  false- 

D 

Theorem  6-6-.  The  problem  of  determining  the  truth  value  of  the 
predicate  Wk— Del  is  undecidable  fcr  arbitrary  choice  of  its 
arguments.. 

Proof.  If  there  were  a  decision  procedure  for  Wk— Del,  then  by 
the  above  theorem,  there  would  also  be  one  for  the  equivalence 
problem  which  is  unsolvable  by  Theorem  5.  2. 

0 

Similar  results  hold  for  insertion: 

Theorem  6.7.  Let  s  €  RSch;  let  R  be  an  n-ary  relation  ir  s,  and 
let  el  and  e2  be  in  Exp(s)  of  degree  n  in  which  R  does  not 
appear.  Then  for  any  tuple  w  €  Tup(n),  e1=e2  wrt  w  if  and  only 
if  Hk-Ins (ts, 2,w) ,  where  ts  is  insert (R,w),  and  E  is 
5—  ( (e  1-e2)  ♦  (e2-e1) )  . 

Proof.  If  e1=e2  wrt  w,  then  for  all  states  st ,  w  $ 

( (el— e2)  +  (e2— el)  )  (st)  ,  and  because  R  does  not  appear  in  el  or  e2, 
we  also  have  w  $  ( (el— e2)  +  (e2— el )  )  (ts  (si)  )  .  Thus  for  all  st,  w  € 
(R—  ( ( e  1— e  2 )  +  (e  2— e  1 ) ) )  (ts(st)),  which  yields  Wk— Ins  (ts,  E,  w  )  true. 

If  el£e2  wrt  w,  then  for  some  st,  either  w  €  (el— e2)  (st)  or  w 
6  (e2— el)  (st)  .  In  either  case,  w  €  (  (e  1— e2)  +  (e2— el ) )  (st)  and 


(IV,  6)  177 


also  w  e  ( (el— e2)  ♦  (e2— e  1)  )  (ts  (st)  )  .  Therefore  w  ^  E(ts(st))  and 
so  Wk-Ins  (ts,  E,  w)  is  false. 

Q 

Theorem  6.8.  The  problem  of  determining  the  truth  value  of  the 
predicate  Wk— Ins  is  undecidable  for  arbitrary  choice  of  its 
arguments- 

Proof.,  By  Theorem  6.7,  a  decision  procedure  for  Wk— Ins  could  be 
used  as  a  decision  procedure  for  "equivalence  wrt  w"  which  is 
impossible  by  Theorem  5-2. 

□ 


Since  it  is  the  universal  quantifier  and  the  set  difference 
operator  which  are  causing  the  trouble,  we  can  try  to  obtain 
completeness  by  eliminating  set  differnece  from  our  formulas. 
Consider  the  mapping  language  without  the  set  difference  operator 
(syntax  II  of  Definition  4.5).  We  would  like  to  have  some  kind 

of  determining  state  st'  such  that  if  we  show  w  €  e(ts(st’)), 

then  we  know  w  €  e  (ts  (st)  )  for  all  states  st.  An  arbitrary 

choice  for  st*  will  not  work  since  w  might  appear  in  e(ts(st')) 

"by  accident"-  But  there  is  one  special  and  simple  state ,  call 
it  a,  such  that  no  tuple  appears  in  e  (ts (4) )  by  coincidence. 
Namely,  <5  is  such  that  R(ffi)  is  empty  for  every  relation  R . 

The  next  theorem  shows  that  ®  has  the  desired  property. 

Theorem  6.9.  Let  s  6  RSch  and  assume  RQ  has  independent  and 
strong  semantics-  If  5  is  the  empty  state,  that  ig,  R  (<5)  for 
every  R  €  s,  and  if  ts  €  Ts(s)  and  e  e  Exp  (s)  of  degree  n  with 
syntax  II  (no  set  difference),  then  for  any  tuple  w  €  VTup(n)  and 
any  valuation  x. 

w/x/  e  e(ts(B))  =>  w/x/  €  Ins(ts,e)/x/. 

From  this  it  follows  that 

Wk— Ins  (ts,e  ,  w)  =>  w  6  Ins(ts,e). 

Proof.  Because  the  semantics  of  RQ  are  strong  and  independent, 
we  may  write,  for  any  state  st  and  valuation  x,  R(ts(st;x))  = 

R  (st)  +Ins  (R) /x/— ##Del  (R)  /x/-  (This  can  be  proved  by  induction  on 


(IV, 6)  178 


the  length  of  ts. )  Therefore,  R(ts(ffi;x))  =  B  (5)  *Ins  (S) /x/  - 
##Del  (R)/x/  =  ft  ♦  Ins  (R) /x/  -  ##Del(R)/x/  =  Ins JR)/x/.  (Ins(e) 
and  Del(e)  are  always  disjoint.) 

The  induction  for  the  operators  projection,  restriction, 
selection,  cross  product  and  union  are  straightforward. 

The  second  part  of  the  theorem  follows  because  for  any  set  S  C 
VTup(n),  if  for  all  valuations  x,  w/x/  6  S/x/,  then  w  €  S.  To 
see  this,  suppose  w  $  S„  Let  x  be  the  valuation  (0 , 1 , 2, 3,«». ) . . 
Then  is  is  easy  to  see  that  w/x/  9  S/x/™ 

0 


Of  course,  this  special  state  will  not  work  when  the  mapping 
language  has  set  difference,  as  can  be  seen  by  the  following 
example: 

/ 

s  :  R  (2)  ,  S  (2) 
e:  R— S 

ts:  insert  (R ,  (0 , 1 )  ) 

stl:  R(st1)=0  S  (stl)  =  { (0,  1) } 

ts(stl):  R  (ts(stt)  )  =  {(0, 1)  }  S  (ts  (stl)  )  =  { (0,1)  } 

Here  (0,1)  €  e(ts(S))  but  (0,1)  is  not  an  element  of  e  (ts  (st)  ) 
for  all  states;  in  particular,  (0,1)  $  e(ts(st1))„ 

The  next  theorem  proves  completeness  for  the  deletion  rules.  We 
may  assume  that  e  conforms  to  syntax  III  of  Definition  4.5  since 
it  contains  no  set  difference  operator. 

Theorem  6.10.  Let  s  6  RSch;  let  e  e  Exp(s)  with  syntax  III 
(Definition  4.5);  let  ts  €  Ts  (s)  ,  and  let  w  6  VTup(n).  Then 

Wk— Del  (ts,e , w)  =>  w  6  Del(rs,e)  ♦  Cv  (e) - 

Proof.  The  contrapositive  of  the  statement  to  be  proved  is: 

w  $  Del(e)+Cv(e)  =>  -*Wk-Del  (ts,e,  w) . 

To  prove  this,  we  will  prove  the  statement: 

If  w  $  Del  (e)  and  z  is  a  valuation  such  that  w/z/  violates 
no  constraint  of  e ,  then  for  some  state  st , 

## w/z/«e  (ts  (st ;  z)  ) 


(IV,  6)  179 


We  use  induction  on  e.. 

Suppose  w  $  Del(R).  Define  a  tuple  w*  as  follows:  For  every 
domain  X,  if  there  is  a  Y  6  X/=  such  that  then  let 

w '[  X  ]=w[  Y  ];  if  there  is  a  VEQ  X=V  6  Drv  (R) ,  then  w'[X]=V; 
otherwise  let  w'[X]=v,  where  v  is  a  value  not  appearing  in  ts. 
Then  we  have  w*>w,  w'/z/  violates  no  constraints  and  w*  £  #  (w”  : 
delete  (R, «")  €  ts}  —  {wM  :  w"<w3  for  some  w3  €  Ins(R)}.  This 
last  statement  can  be  justified  as  follows:  If  we  had  w*  8  #  {w" 

:  delete  (R ,  wM)  €  ts}  —  (wn  :  w”>w3  for  some  w3  €  Ins(R)},  then  we 
could  reconstruct  w  from  the  operations  of  Mb  as  follows: 

Suppose  w*[X]  was  assigned  a  new  value.  Since  this  new  value 
does  not  appear  in  ts ,  there  must  be  a  w"<w!  such  that 
delete  (R, w")  8  ts,  and  w*[X]=*  — Then  the  tuple  w4  such  that 
w4[  X  ]=  •—  *  and  w4[  Y  ]=w  *  [  Y  ]  for  X/X  has  the  property  that  w4  e  #{wn 
:  delete  (R,  w“)  €  ts}  —  {w"  :  wM<w3  for  some  w3  €  Ins(R)}.  We 
replace  all  new  values  in  this  manner  by  blanks,  and  apply  the 
operations  of  Mb  to  the  result  to  get  w. 

Now  w*  #  {w"  :  delete  (R  ,w")  €  ts}  —  {wM  :  wM<w3  for  some  w3  8 
Ins(R)}  means  either  that  there  is  no  w'^w*  such  that 
delete  (R ,  w")  €  ts,  or  that  w'  8  Ins  (R) .  In  the  first  case,  if  st 
is  such  that  R(st)  =  {w */z/} ,  then  we  will  have  w */z/  6 
R(ts(st;z)).  In  the  second  case  we  will  have  w  */z/  €  R(ts(st;z)) 
for  any  st. 

Suppose  w  (£  Del(e[X=V]),  where  e  is  a  selection  on  a  base 
relation.  Define  a  tuple  w*  by  w'[Y]=w[Y]  for  Y*X,  and  if 
w[X]=*— *,  then  wf[X]=V;  otherwise  w,[X]=w[X].  Then  w*>w,  wVz/ 
does  not  violate  any  constraints  on  e[X=V  ],  and  w*  $  Del(e)  since 
w'  €  Del(e)  implies  w  €  Mb(e£X=V],  Del  (e)  ) .  By  induction,  there 
is  a  state  st  such  that  # #w '/z/«e  (ts  (st ; z) )  Since  w*[X]#'— *, 

##w*/z/«e[  X=V  ]  (ts  (st;  z)  )  #0,  and  since  wVz/<w*/z/, 

##e/z/«e[ X= V ] (ts (st ; z ) ) *0. 

Suppose  w  $  Del(e@r)  =  Del  (e)  fi  ~  fi  De 1 ( r )  , 

where  e  is  a  cross  product  of  selections  and  r  is  a  selection. 
Then  pi  (w)  $  Del  (e)  and  p2  (w)  9  Del(r).  By  induction,  there  is  a 
state  stl  such  that  # #p1  ( w)  /z/»e  (ts  (stl ;  z)  )  Suppose  r  is 

R£  X 1 = V  1  ]»  .  .  £  X n= V n  ]„  We  proceed  in  a  manner  similar  to  Theorem 


(IV, 6)  180 

5.6  to  get  a  w*  €  r(ts(st1;z))  such  that  w*  >  p2(w).  Then  we 
will  have  ##w/z/«  (efir)  (ts  (st;  z)  )  #/). 

Suppose  w  <f  Del(e[X=Y])  =  Mb  (e[X=Y  ],  Del  (e)  )  .  Define  a  tuple 
w*  as  follows;  (i)  If  there  is  a  Z  €  X/=  (with  respect  to 
e[X=Y])  such  t)iat  w[Z]*'— then  w •  [  X  ]=w '  [  Y  ]=w[  Z  ]  ;  (ii)  if  there 
is  a  VEQ  X=V  F  Drv(e[X=Y]),  then  w*  [  X  ]=  w*  [  Y  ]=V  ;  (iii)  otherwise 
w '£  X ]=w ' [ Y  ]-v ,  where  v  is  a  new  value.  This  will  always  be 
possible  because  Del  sets  are  generated  from  a  finite  set  of 
tuples  using  the  #-operator.  Also,  w»£Z]=w[Z]  for  Z*X  and  Z*Y. 
Then  w'>w,  w*/z/  does  not  violate  any  constraints  of  e£X=Y],  and 
w*  $  Del(e).  We  justify  this  last  statement  as  follows;  In  the 
first  two  cases,  if  w*  €  Del(e),  it  is  clear  from  the  definition 
cf  Mb  that  w*  €  Mb  (e[  X=Y  ]  ,Del  (e) )  =  Del  (e£  X=Y  ])  .  In  the  third 
case,  w*  (£  Del  (e)  since  it  containts  new  values. 

i 

If  w  (£  Del(e[X])  =  Del  (e)  £  X~=  •  — '  ][  X  ] ,  then  the  tuple  w'  such 
that  w*[X]=w,  w'[X-]=*  — *,  is  not  in  Del(e).  By  induction,  there 
is  a  state  st  such  that  # #w */z/*e  Its (st ; z) ) From  this  we  have 
##w/z/»e[  X  ]  (ts  (st ;  z)  )  *#., 

If  w  f  Del(e1+e2)  =  De  1  (e  1 )  «Del  ( e2)  ,  then  w  $  Del  (el)  or  w  $ 
Del(e2).  In  the  first  case,  we  have  by  induction  a  state  st  such 
that  ##w/z/«e  1  (ts  (st ;  z)  )  *0  which  implies 

##w/z/« (e 1  +  e2)  (ts  (st ; z) ) The  second  case  is  analogous. 

0 


6 . 5.  Summary  and  Conclusions 

In  this  chapter  ne  have  defined  and  studied  data  model 
operations  and  operation  mappings. 

We  began  in  Section  1  by  defining  the  set-function  RQ  of 
operations  for  the  relational  data  model  RDra.  We  discussed  the 
various  semantics  which  could  be  given  to  RQ.  Six  different 
semantics  were  defined,  depending  on  whether  operations  have 
inter- relational  side  effects  or  intra-relational  side  effects. 
Only  the  semantics  having  no  side  effects  were  deterministic,  but 
we  argued  that  the  others  were  also  important  to  study  in  the 
context  of  an  advanced  database  architecture  as  described  in 
Chapter  1. 


(IV, 6)  181 


Operation  mappings  were  defined  tc  translate  operations  on  the 
view  into  a  seguence  of  operations  on  the  base  schema-  It  was 
acknowledged  that  state-dependent  translations  are  a  legitimate 
mapping  type/  but  such  generality  was  avoided  in  this  initial 
investigation. 

The  main  problem  studied  in  this  chapter  was  the  determination 
of  the  type  3  property  for  operation  mappings.  This  property 
states  that  the  translation  has  the  desired  effect  as  long  as  the 
initial  and  final  database  states  satisfy  the  schema  constraints. 
In  Section  2  we  reduced  this  problem  to  a  set  of  seven  simpler 
problems.  Like  the  similar  reduction  in  Chapter  4,  these  seven 
problems  no  longer  dealt  directly  with  state  and  operation 
mappings,  but  with  relational  algebra  expressions  (and 
translation  sequences)  . 

Section  3  studied  two  of  the  above  problems  which  were  the 
most  primitive:  when  specified  tuples  were  always  inserted  or 
always  delated,  without  regard  to  what  else  happened.  In  their 
full  generality,  both  problems  were  shown  to  be  undecidable.  We 
presented  algorithms  of  an  algebraic  flavour  which  were  sound  for 
full  relational  algebra,  and  in  Section  4  we  showed  they  were 
complete  for  relational  algebra  without  set  difference. 

We  can  conclude  a  number  of  things  from  the  results  of  this 
chapter. 

First,  we  can  say  that  our  general  algebraic  approach  works. 

In  order  to  solve  the  problems  we  faced,  finite  or  recursive  sets 
were  defined  and  manipulated  usirg  algebraic  operators.  This 
approach  is  conceptually  economical  and  lends  itself  well  to 
mathematical  proofs. 

We  saw  that  un decidabi lit y  was  a  problem  just  as  it  was  for 
the  state  mappings  in  Part  III:  Calculating  properites  of 
operation  mappings  can  be  used  to  calculate  the  equivalence  of 
relational  algebra  expressions.  So  again  the  power  of  the 
structure  mapping  language  caused  problems  to  be  unsolvable,  even 
though  the  operation  mapping  language  was  very  simple. 

In  order  to  prove  the  completeness  property  for  Ins  and  Del, 
we  needed  to  assume  independent  and  strong  semantics.  This  means 


(IV, 6)  182 


that  complete  rules  for  multiple  levels  of  mappings  are  not 
possible  unless  all  levels  but  the  final  one  have  strong 
properties  (properties  to  be  discussed  in  the  next  chapter) 


(IV,  7)  183 


CHAPTER  1 


Further  Properties  of  Operation  Mappings 


In  the  last  chapter  we  defined  relational  operations  and 
operation  mappings.  We  devised  sound  and  complete  procedures 
which  determined  when  an  operation  mapping  always  inserted  a 
specified  tuple  into  a  view  or  deleted  specified  tuples  from  a 
view.  These  procedures  work  for  correctness  of  an  operation 
mapping  with  respect  to  weak  semantics.  In  a  real  database 
system,  however,  the  most  useful  semantics  would  be  the  medium 
and  Strong  ones.  Hence  it  is  important  to  have  procedures  for 
these  situations.  Weak  semantics  may  not  be  too  useful  in 
itself,  but  it  is  a  subcondition  of  medf.ur.,  and  strong  semantics, 
and  we  therefore  can  expect  to  be  able  to  use  the  procedures  from 
the  last  chapter  in  the  context  of  medium  and  strong  semantics. 

This  chapter  contains  four  sections.  In  the  first  section  we 
propose  ar  extension  of  the  previous  chapter's  procedures  to  deal 
with  medium  and  strong  semantics.  We  also  show  that  this 
extension  is  sound. 

In  Section  2  we  consider  the  problem  of  comple tene ss.  As  we 
can  expect  by  now,  the  problem  is  undecidable  with  set  difference 
as  part  of  the  mapping  language.  Yet  even  with  this  operator 
removed,  the  procedures  are  not  complete.  We  discuss  this 
problem  and  how  it  relates  to  the  join  operator. 

In  Section  3  we  briefly  consider  operation  mappings  which  do 
not  depend  on  schema  constraints  being  satisfied.  This  type  of 
mapping  would  be  important  because  it  would  be  less  affected  by 
the  evolution  of  the  database. 

Section  4  contains  a  summary  and  conclusions. 


(IV, 7)  184 


7.1  Medium  and  Strong  Type  3  Mappings 

We  have  defined  recursive  sets  Ins  and  Del  which  characterize 
the  predicates  Wk-Ins  and  Wk— Del.  We  now  want  to  deal  with  the 
four  predicates  related  to  medium  and  strong  semantics.  The 
first  guestion  we  ask  is,  can  the  sets  Ins  and  Del  be  used 
directly  in  characterizing  the  other  predicates?  For  Med— Ins 
(see  Definition  6-5)  we  want  to  know  that  the  given  tuple  w  is 
inserted  into  the  expression  e  and  that  nothing  is  deleted  from 
e.  For  Med— Del  we  want  to  know  that  all  w'>w  are  deleted  from  e 
and  nothing  is  inserted  into  e.  For  Str— Ins  we  want  to  know  that 
only  w  is  inserted  into  e  and  nothing  is  deleted  from  e,  and  for 
Str— Del  we  want  to  know  that  all  w*>w  and  only  these  are  deleted 
from  e  and  nothing  is  inserted  into  e.  One  might,  then,  try  to 
prove  the  following: 

Med— Ins  (ts,e ,  w)  if  and  only  if  w  6  Ins(ts,e)  •  Pel  (ts,  e)=/0 , 

Med—  Del  (ts  ,e ,  w)  if  and  only  if  w  6  Del(ts,e)  •  Ins(ts,e)=tf, 

Str—  Ins  (ts,e, w)  if  and  only  if  {w}=Ins  (ts,e)  •  Del(ts,e)=0  and 

St  r-Del  (ts,  e,  w)  if  and  only  if  {w}  =Del  (ts,  e)  •  Ins  (ts,  e)  =0  . 


These  clauses  are  not  valid,  however,  because  the  statements 
Ins(e)=j*  and  Del(e)=/tf  are  not  sufficient  to  assure  the  absence  of 
insertions  and  deletions,  respectively.  To  illustrate,  consider 
the  following  examples: 

s:  R  (3 ) 
e:  R[  1 , 2  ] 

ts:  insert  (F.,  (0,0, 0)  )  ; delete  (F  ,  ( 1,  1,  1)  ) 

Ins(e)  =  Ins(F)[1,2] 

=  {(0,0,0)  }[  1,2] 

=  {(0,0)} 

rel(e)  =  Del  CB)  [>•-•][  1*2] 

=  {0,1,1)}[3='-'][  %2] 

=  *[1,2] 

=  0 


Thus  we  have  (0,0)  €  Ins  (e)  and  Del  (e)  =/o.  However,  consider  the 
state  st  where  R  contains  only  the  tuple  (1,1,1)  so  that  e(st)  = 
{(1,1)}.  In  ts(st)  ,  F  is  empty,  and  so  e  (ts  (st) )  is  also  eranty. 


(IV, 7)  185 


Hence,  a  deletion  has  occurred  to  e  and  it  is  not  possible  for 
Med— Ins  (ts,  e,  (0, 0) )  or  Str— Ins  (ts,e  ,  (0,0)  )  to  be  true  even  though 
Del(e)=0.  The  general  principle  ilustrated  here  is  that  Del 
records  tuples  which  are  always  deleted.  In  the  case  of 
projection,  every  extension  o!  a  tuple  must  be  deleted  from  the 
unprojected  relation  in  order  for  that  tuple  to  always  be 
deleted-  However,  the  tuple  may  ••sometimes"  be  deleted  when  it 
has  only  one  extension  in  the  unprojected  relation  and  this  one 
extension  is  deleted. 

Consider  a  second  example: 

s:  R  (2)  ,  S  (2) 

e:  R8S 

ts:  delete  (F,  (0,0)  )  jinsert  (R,  (1,1)) 

Ins  (S)  =  0 

Ins(e)  =  Ins  (B)  Sins  (S) 

=  Ins  (R) 80 
=  0 

Del(R)  =  {(0,0)} 

Del  (S)  =  0 

Del(e)  =  Del  (E)  8  #{(-,-)}  ♦  #{(-,-)}  ®  Del  (S) 

=  #{(0,0,-,-)}  +  0 

=  #{(0*0,-,-)} 

Thus  we  have  (0,0,—  ,—)  €  Del (e)  and  Ins  (e)  =  0.  However,  take 
the  state  st  where  R  is  empty  and  S  contains  (1,2),  Then  e  (st)  = 
0.  In  ts(st),  R  contains  (1,1)  and  S  still  contains  (1,2),  and 
so  e(ts(st))  contains  (1, 1,1,2).  Thus  an  insertion  has  occurred 
to  e  even  though  Ins (e)  =  0.  The  reason  is  that  in  order  for  a 
tuple  to  always  be  inserted  into  a  cross  product,  both  of  its 
projections  must  be  always  be  inserted  into  the  components.  If  a 
tuple  is  always  inserted  into  one  component,  it  can  sometimes 
connect  with  a  tuple  in  the  other  component  and  cause  a  spurious 
insertion. 

Examples  can  also  be  given  to  show  that  Ins  (ts, e) =  {w}  and 
Eel  (ts,  e)  -  {w}  are  not  sufficient  to  guarantee  that  no  other 
insertions  or  deletions,  respectively,  occur. 


(TV, 7)  186 


We  need  a  more  sophisticated  algorithm  in  order  to  decide  when 
no  insertions  or  no  deletions  occur.  An  informal  discussion  of 
the  proper  conditions  is  presented,  and  then  we  provide  exact 
definitions. 

When  the  semantics  are  strong,  "Ins(E)=0n  will  accurately 
determine  when  no  insertions  occur  to  a  base  relation  E.  With 
medium  semantics,  nothing  is  inserted  into  a  base  relation  E  if 
no  operation  of  the  form  insert  (R,.w)  appears  in  ts.  With  weak 
semantics,  we  must  specify  no  operations  at  all  on  R  to  be 
assured  of  their  being  no  insertions  into  E.  The  analogous 
statements  are  true  fcr  deletion. 

Nothing  is  inserted  into  a  projecton  as  long  as  nothing  is 
inserted  into  the  unprojected  relation.  The  analogous  remark  is 
true  for  deletion  from  projections,  since  if  a  tuple  w  is  deleted 
from  e,  this  deletion  will  be  visible  in  e[ X  ]  whenever  there  is 
no  other  tuple  in  e  with  the  same  values  as  w  in  the  X  domains. 

For  restrictions  and  selections,  there  are  two  situations  when 
no  insertions  occur.  Insertions  will  not  occur  if  nothing  is 
inserted  into  the  unrestricted  or  unselected  relation- 
Insertions  will  also  not  occur  as  long  as  the  ones  that  do  occur 
are  not  visible  in  the  selection  or  restriction,  and  as  long  as 
these  invisible  insertions  do  not  trigger  any  other,  possibly 
visible  insertions.  (The  idea  of  triggered  insertions  and 
deletions  will  be  illustrated  below.)  The  analogous  statements 
hold  for  no  deletions  from  restrictions  and  selections:  Either 
deletions  do  not  occur  from  the  unrestricted  or  unselected 
relations,  or  they  do,  but  they  are  invisible  and  do  not  trigger 
any  visible  deletions. 

Nothing  is  inserted  into  a  cross  product  when  both  components 
have  nothing  inserted  into  them.  As  the  second  example  above 
illustrated,  both  components  must  be  insertion  free,  or 
coincidental  insertions  can  occur.  Clearly,  the  analogous 
condition  must  hold  for  absence  of  deletions  from  cross  products. 

In  order  for  there  to  be  no  insertions  into  a  union,  both 
components  must  have  no  insertions.  In  order  for  there  to  be  no 
deletions  from  a  union,  each  component  must  likewise  have  no 


(IV,  7)  187 


deletions-  A  deletion  from  only  one  component  may  not  always  be 
visible  in  the  union,  but  it  will  be  if  the  tuple  deleted  is  not 
present  in  the  other  component. 

For  a  set  difference,  the  requirement  of  no  insertions  means 
that  there  must  be  no  insertions  into  the  first  and  no  deletions 
from  the  second  component-  The  analogous  statement  holds  for 
delet ions. 

He  have  essentially  described  two  predicates.  Noins  (ts,e)  and 
Nodel(ts,e).  To  see  what  these  predicates  are  intended  to  mean 
in  terms  of  set-theoretic  relations,  note  that  Noins(ts,e) 
implies  that  the  only  changes  possible  for  e  are  deletions. 
Similarly,  Nodel(ts,e)  implies  that  the  only  changes  allowed  for 
e  are  insertions.  This  means  that  if  Noins  (ts,e)  is  true,  then 
for  any  sta-e  st  (and  valuation  x) 

e  (ts  (st;  x)  )  C  e  (st)  , 

i.e.,  the  new  state  contains  only  old  tuples.  Also,  if 
Nodel(ts,e)  is  true,  then  for  any  state  st  (and  valuation  x) 

e(st)  C  *3  ( ts  (s  t ;  x )  )  , 

i.e-,  the  new  state  contains  all  of  the  old  tuples. 

i 

The  idea  of  insertions  and  deletions  being  triggered  by 
invisible  insertions  and  deletions  can  be  illustrated  by  the 
following  example: 

s:  R  (2)  ,  s  (2) 
e:  (RfiS)[4='0,;i 

ts;  insert  (B,  (5 ,6)  ) ;  insert  (S  ,  (7,  8)  ) 

Ins(RSS)  =  {(5, 6, 7, 8)} 

Ins  (e)  =  { (5,6,7 , 8 ) ) [ 4=' 0*  ] 

=  0 

There  is  an  insertion  into  the  unselected  relation,  but  it 
violates  the  constraint  4=*0*  of  e,  and  so  it  does  not  appear  in 
e.  However,  as  in  a  previous  example,  an  insertion  into  e  ma y 
occur  if  the  state  of  S  happens  to  contain  any  tuple  with  a  *0* 
in  domain  2.  (An  analogous  example  can  be  given  for  deletions.) 
Hhat  we  would  like  to  have  known  was  that  (5,6,7, 8)  was  the  only 


(IV, 7)  188 


tuple  ever  inserted  into  the  expression.  Then  we  would  know  that 
Ins  (e)  =  ft  really  meant  no  insertions  into  e.  Thus  we  see  that 
problems  of  spurious  tuples  appearing  and  disappearing  is  not 
always  attributable  to  the  top  level  relational  algebra  operator* 

We  attack  this  problem  by  defining  two  properties  (i.e., 
predicates)  Iok  and  Dok  (Insertion  okay  and  Deletion  okay).  The 
meaning  of  Iok  is  that  if  Iok(ts,e)  is  true,  then  only  elements 
of  Ins  (ts,e)  are  ever  inserted  into  e.  Similarly,  truth  of 
Dok(ts,e)  implies  that  only  the  elements  of  ##Del(ts..e)  are  ever 
deleted  from  e.  In  mathematical  terms,  given  a  state  st,  the  set 
of  tuples  which  are  inserted  into  e  is  e (ts (st) ) — e (st)  and  the 
set  of  those  which  are  deleted  is  e  (st)  — e  (ts  (st)  )  *  Therefore, 
the  desired  properties  of  Iok  and  Dok  are  that  these  sets  should 
be  subsets  of  Ins(ts,e)  and  ##Del(ts,e),  respectively.  That  is, 
given  any  state  st,  if  Iok(ts,e),  then  e  (ts  (st)  )-e  (st )  C 
Ins(ts,e)  and  if  Dok(ts,e),  then  e  (st)  — e  (ts  (st)  )  C  ##Dcl(ts,e). 

It  may  seem  that  all  we  need  to  define  Noins  and  Nodel  are  the 
respective  conditions 

Iok(e)  •  lrs(e)=0,  and 
Dok  (e)  •  Del  (e)  -ft. 

(For  notational  convenience,  we  will  sometimes  omit  thle  argument 
ts  from  Ins,  Del  and  the  others.)  It  is  true  that  these  clauses 
imply  Noins(ts,e)  and  Nodel  (ts,e)  ,  respectively.  However,  the 
predicates  Noins  and  Nodel  are  themselves  needed  to  define  Iok 
and  Dok;  that  is,  the  definitions  of  Noins,  Nodel,  Iok  and  Dok 
are  mutually  recursive.  To  see  why  this  is  so,  consider  the 
following  simple  example: 

s:  R(2),  S  (2 )  ,  T  (4) 
e:  (E[  2=1  ]S)  +  T 
ts:  insert  (T,  (2,4 ,6 ,8) ) 

The  tuple  (2, 4, 6, 8)  is  the  only  new  tuple  ever  to  appear  in  e 
from  the  action  of  ts,  so  we  want  Iok(e)  to  be  true.  Yet  if  we 
only  look  at  the  expression  e,  we  would  calculate  Iok  (e)  =  false 
because  one  of  the  terms  is  a  join.  A  correct  calculation  of 
Iok (e)  must  note  that  although  a  component  of  e  is  a  join. 


(IV,  7)  189 

nothing  is  inserted  into  the  join.  Thus  we  see  that  the 
definition  of  Iok  depends  on  the  predicate  Noins. 

He  will  now  discuss  in  more  detail  the  proper  specification  of 
Iok  and  Dok. 

For  a  base  relation  F.,  Iok  (E)  and  Dok  (E)  will  always  be  true 
—  provided  that  the  semantics  are  independent  and  strong.  (More 
will  be  said  about  this  later.) 

With  projection,  Iok(e[X])  will  be  true  if  Iok(e)  is,  since 
spurious  insertions  into  e  will  always  be  visible  in  e£X],  For 
Dok(e[X]J  to  be  true,  it  is  necessary  that  that  Dok(e)  is  true, 
since  spurious  deletions  from  e  can  sometimes  be  visible  in  e[X]. 
But  it  must  also  be  true  that  every  tuple  in  Del(e)  has  blanks  in 
the  domains  projected  out  (i.e. ,  in  X~) .  Take  the  following 
situation : 

s;  R  ( 4) 
e:  R[1,2] 

ts:  delete (E, (7, 8,-, 0) ) 

Del(R)  =  {  (7,8  ,— ,0)  } 

Del  (e)  =  {  (7,8,— ,0))[3,4  =  '— *  ][  1,  2] 

=  =  f6 

Here,  Dok  (e)  will  be  false  even  though  Dok  (R)  is  true  'because  ts 
can  sometimes  cause  deletions  from  E£1,2J.  In  general,  if  every 
tuple  in  Del(e)  has  blanks  in  the  X-  domains,  the  selection 
Del  (e)[  X=  *  — '  ]  used  to  get  Del(e[X])  will  not  lose  anything,  i.e., 
for  every  tuple  t  in  Del(e),  t[X]  will  be  in  Del(e[X]). 

For  selection  and  restriction,  if  Iok  or  Dok  is  true  for  the 
unselected  (unrestricted)  relation,  the  same  will  be  true  for  the 
selection  (restriction).  As  noted  before,  spurious  changes  ran 
always  be  visible  in  this  case. 

For  a  cross  product,  the  only  way  Iok  can  be  true  is  for  there 
to  be  no  insertions  into  either  component.  For  any  insertion  in 
one  component  will  link  up  with  any  element  of  the  other 
component  . 

Deletion  from  a  cross  product  can  be  effected  in  three  ways: 
by  deleting  from  el,  by  deleting  from  e2,  or  by  deleting  from 
both  el  and  e2.  There  are  three  corresponding  alternatives  for 


(IV, 7)  190 


Dok(e18e2)  to  be  true.  Namely,  Dok  is  true  on  el  and  nothing  is 
deleted  from  e2;  Dok  is  true  on  e2  and  nothing  is  deleted  from 
el,  or  Dok  is  true  on  both  el  and  e2. 

There  are  three  ways  to  effect  an  insertion  into  a  union 
e1+e2:  by  inserting  into  el,  by  inserting  into  e2  or  by 

inserting  into  both  el  and  e2„  There  are  three  corresponding 
possibilities  for  Iok(e1  +  e2)  to  be  true:  Iok  can  be  true  for  el 
with  no  insertions  to  e2;  Iok  can  be  true  for  e2  with  no 
insertions  to  el,  or  Iok  can  be  true  for  both  el  and  e2.  For 
Dok,  we  have  a  situation  similar  to  Dok  on  projections:  First, 
Dok  must  be  true  for  each  component.  Second,  we  cannot  allow  any 
tuples  to  be  lost  in  calculating  De  1  (e1+e2)  .  More  precisely,  it 
must  be  the  case  that  Del  (el)  =  Del(e2)  since  Del(e1+e2)  = 

Del  (e  1)  »Del  (e2)  „  If  Del(el)  does  not  egual  Del(e2),  say  if  w  € 
Eel  (e  1)  — Del  (e2)  ,  then  w  will  not  be  in  Del(e1+e2),  and  w  may 
cause  spurious  deletions  from  the  union  when  the  state  of  e2  does 
net  contain  members  of  el  deleted  by  w» 

Lastly,  Iok  is  true  for  a  difference  el— e2  only  if  no  spurious 
insertions  can  occur  for  el  (Iok (el ) =true) ,  and  no  spurious 
deletions  can  occur  from  e2  (Dok  (e2)  =  true)  ..  There  is  also  a  "no 
tuple  loss"  condition;  namely,  the  tuples  inserted  into  el  must 
be  the  same  as  the  tuples  deleted  from  e2,  i.e.,  we  must  have 
Ins  (el)  =  ##Del(e2)..  The  argument  is  similar  to  the  other  two 
"no  tuple  loss"  arguments. 

Now  that  we  have  given  an  informal  discussion  of  the  four 
predicates  Noins,  Nodel,  Iok  and  Dok,  we  will  define  them  and 
prove  their  properties  formally.  The  proof  of  the  theorem  is 
technically  a  simultaneous  induction  on  e,  but  we  group  together 
the  arguments  for  each  predicate- 

Def inition  7.1.  Let  s  6  ESch  ;  let  e  -  Exp (s)  ,  and  let  ts  € 

Ts (s)  .  The  predicates  Noins,  Nodel,  Iok  and  Dok,  which  take  two 
arguments  ts  and  e  (with  s  implicit  and  ts  often  omitted) ,  are 
defined  as  follows: 

[1]  Noins  (R)  =  if  F.Q  has  strong  semantics  then  (Ins(E)=0) 

else  if  EC  has  medium  semantics 

then  (insert  (R,w)  $  ts  for  any  w) 


(IV, 7)  191 


else  /*  RQ  has  weak  semantics  */ 

(op(R,w)  for  any  w) 

[2]  Noins(c£X])  =  Noirs(e) 

[3]  Noins  (e£  X=Y  J)  =  Noins(e)  ♦  (Iok(e)  •  Ins(e)  C  Cv(e£X=Y])) 

[4]  Noins  (e[X=V])  =  Noins(e)  ♦  (Iok(e)  •  (Ins(e)  C  Cv(e£X=V])) 

[5]  Noins  (e1fie2)  =  Noins  (el)  •  Noins(e2) 

[6]  Noins  (e1+e2)  =  Noins  (el)  •  Noins  (e2) 

[7]  Noins(e1-e2)  =  (Noins(el)  •  Nodel(e2))  ♦ 

(Iok(el)  •  Ins  (el )  C  Ins(e2)  •  Nodel(e2))  + 
(Noin  s  (e  1 )  •  Dok(e2)  •  #Lel(e2)  C  #Del(e1)) 

[1]  Nodel(R)  =  if  RQ  has  strong  semantics  then  (Del(R)=0) 

else  if  RQ  has  medium  semantics 

then  (delete  (R,  w)  $  ts  for  any  w) 
else  /*  RQ  has  weak  semantics  */ 

(op(R,w)  $  ts  for  any  w) 

[2]  Nodel  (e[  X  ])  =  Nodel  (e)  +  (Dok  (e)  •  ##Del(e)£X]  C  Ins(e)[X]) 

[3]  Nodel  (e[  X=Y])  =  Nodel(e)  +  (Dok(e)  •  (Del(e)  C  Cv(e[X=Y])) 

£4]  Nodel  (e[  X=V  ])  =  Nodel(e)  +  (Dok(e)  •  (Del(e)  C  Cv(e[X=V])) 

[5]  Nodel  (e1fie2)  =  Nodel  (el)  •  Ncdel  (e2) 

[6]  Nodel(e1*e2)  =  (Nodel(el)  •  Nodel(e2))  + 

(Nodel  (e  1 )  •  Dok(e2)  »  ##Del(e2)  C  Ins  (el))  ♦ 
(Nodel  (e2)  •  Dok(el)  •  ##Del(«?1)  §  Ins(e2)) 

[7]  Nodel  (el— e2)  =  Nodel  (el)  •  Noins  (e2) 

[1}  Iok(R)  =  true  for  strong  semantics,  otherwise  false 
[  2  ]  Iok  (e[  X  ])  =  I  ok  (e) 

£3]  Iok  (e[  X=Y ])  =  Iok  (e) 

£4]  Iok (e£X=  V  ])  =  Iok(e) 

£5]  Iok(e1fle2)  =  Noins(el)  •  Noins  (e2) 

[6]  Iok(e1+e2)  =  (Iok(el)  •  Iok(e2))  +  (Iok(el)  •  Noins(e2))  + 

(Noins (el)  •  Iok(e2)) 

£7]  Iok  (e  1— e2)  =  Iok(el)  •  Dok(e2)  •  Ins  (el )  =  ##Del  (e2) 

£1]  Dok (R)  =  tr ue  for  strong  semantics,  otherwise  false 
£2]  Dok  (e[  X  ])  =  Dok(e)  •  Del  (e)  =Del  (e)  £  X~  ] 

£3]  Dok  (e£  X=  Y  ])  =  Dok  (e) 

£4]  Dok  (e£  X=  V  ])  =  Dok  (e) 

£5]  Dok  (el  8e2)  =  (Dok  (el )  •  Dok(e2))  ♦  (Nodel(el)  • 

(Dok(el)  •  Nodel(e2)) 


Dok(e2))  + 


(IV, 7)  192 


[6]  Dok  (e1+e2)  =  Dok(el)  •  Dok(e2)  •  #Del  (el)  =#Del  (e2) 

[7]  Dok(e1-e2)  =  (Dok(el)  •  Iok(e2))  +  (Dok(el)  •  Noins(e2))  ♦ 

(Nodel(el)  •  Iok(e2)) 


D 


The  first  theorem  of  the  chapter  is  a  type  of  soundness 
theorem:  Truth  of  one  of  the  predicates  implies  the  respective 

property  on  states: 


Theorem  7.1,  Let  s  €  ESch;  let  e  he  an  expression  over  s,  and 
let  ts  be  a  translation  sequence.  Then  as  long  as  the  semantics 
are  independent  and  at  least  medium: 

(i)  Noins  (ts,e)  =>  (Vst)  e  (ts  (st)  )  C  e(st); 

(ii)  Nodel  (ts, e)  =>  (Vst)  e  (st).  C  e  (ts  (st)  )  ; 

(iii)  Iok(ts,e)  =>  (¥st)  e  (ts  (st)  ) -e  (st)  C  Ins(ts,e)  ,  and 

(iv)  Dok(ts,e)  =>  (V st)  e  (st ) -e  (ts  (st)  )  C  ##Del(ts,e). 


(For  notational  convenience,  valuations  are  implicit.) 


Proof-  This  proof  appears  in  the  Appendix. 

□ 

Having  defined  and  proved  the  basic  properties  of  Noins, 

Nodel,  Ioh  and  Dok,  we  can  give  sufficient  conditions  for  the 
truth  of  tne  predicates  Str— Ins,  Str— Del,  Med— Ins,  M^d— Del  and 
Indep . 

Theorem  7.2.  Let  s  €  ESch;  let  e  €  Exp (s) ;  let  ts  6  Ts(s) ,  and 
let  v  6  VTup (n) .  Then  as  long  as  the  semantics  are  independent 
and  at  least  medium: 

(i)  Jns(ts,e)  =  {v}  •  Iok(ts,e)  •  Nodel(ts,e)  =>  Str— Ins  (ts,  e, w) 

(ii)  Del  (ts  ,  e)  =  {w}  •  Dok(ts,e)  •  Noins(ts,e)  =>  S  tr-Del  (ts,  e,  w) 

(iii)  w  6  Ins(ts,e)  •  Nodel(ts,e)  =>  Med— Ins  (t sf  e ,  w) 

(iv)  v  €  Del(ts#e)  •  Noins(ts,e)  =>  Med— Del  (ts,e,  w) 

(v)  Noins(ts,e)  •  Nodel(ts,e)  =>  Indep  (ts,e) 

Proof.  The  proof  is  clear  from  Theorem  7.1. 


□ 


(IV, 7)  193 


7.2*  Completeness  of  the  Procedures 

We  have  shown  that  the  predicates  Noins,  Nodel,  Iok  and  Dok 
are  sound,  that  is,  that  their  being  true  implies  the  desired 
properties  on  all  states.  As  we  have  done  before,  we  will  also 
ask  if  the  formulas  are  complete:  If  the  properties  on  the 
right-hand-sides  of  the  implications  of  Theorem  7,1  are  true,  are 
the  predicates  on  the  left- hand-sides  also  true? 

It  is  not  surprising  that  with  the  set  difference  operator, 
the  answer  is  no.  The  following  is  a  counterexample: 

s:  T  (2) 

e:  (TfiT)  [1,2  ]— T 
ts:  insert (T, (vl ,v2)  ) 

We  calculate: 

Iok  (T)  =  true 
Ins(T)  =  {(v1,v2)  } 

Noins ( T)  =  false 

Nodel (T)  =  true 

Ins  (TfiT)  =  £(v1,v2,v1,v2)} 

Ins((T8T)[  1,2])  =  {  (vl ,  v2)  } 

Iok  (TfiT)  =  Noins  (T)  •  Noins  (T)  =  false  •  false 

=  false 

Iok  (  (TfiT)  [1,2])  =  false 

Noins  (  (TfiT)  [1,2])  =  Noins  (TfiT)  =  false  •  false 

=  false 

Ncins(e)  =  (Noins  (  (TfiT)  [  t,  2  ])  •  Nodel  (T)  )  + 

(Iok  ((TfiT)  [  1,2])  •  Insl  (TfiT)  [1,2])  Clns(T)) 

=  (false  •  true)  ♦  (false  •  true) 

=  false 

Thus  we  have  Noins (e)  false,  but,  clearly,  e{st)=£  for  every 
state  st,  and  hence  e(ts(st;x))  C  e  (st)  for  all  states  st  and 
valuations  x. 

We  can  give  similar  counterexamples  for  the  other  predicates, 
Nodel,  Iok  and  Dok,  It  is,  in  fact,  the  case  that  two  properties 
we  would  like  these  predicates  to  characterize  are  undecidable 


(IV, 7)  194 

when  set  difference  is  included  in  the  set  of  operators.  We  will 
prove  this  in  the  following  theorem: 

Theorem  7.3.  Let  s  6  RSch;  let  el,  e2  6  Exp(s)  of  degree  n  and 
let  R  (n)  €  s  such  that  R  does  not  appear  in  el  or  e2.  Then  the 
following  hold: 

(i)  e1  =  e2  if  and  only  if  (Vst)  El(ts1(st))  C  El  (st)  ,  where 
El  is  (el— e2)  ♦  (e2-e1)  —  R,  and  tsl  is  delete  (R  ,  (—,...  .  ,— )  )  ; 

(ii)  for  any  w  6  Tup  (n)  ,  e1^e2  wrt  w  if  and  only  rf  <¥st ) 

El  (st)  C  El(ts2(st)),  where  El  is  as  above,  and  ts2  is 
insert  (R ,  w) « 

Proof.  (i)  Suppose  e1=e2.  Then  for  any  state  st, 

(el— e2)  ♦  (e2—  el )  (st)  =0  and  R  (tsl  (st)  )-fi.  Hence  El  (ts  1  (st)  )  =0,  and 
therefore  El(ts1(st))  C  El(st).  Now  suppose  e1£e2.  Then  for 
some  state  st  and  tuple  w,  w  6  (e 1-e2) + (e2— el)  (st) -  We  may 
assume  that  w  €  R  (st)  .  Then  we  get  w  El(st),  but  w  € 

E 1  (ts 1  (st) ) .  Hence  El 'ts  (st) )  is  not  a  subset  of  El(st). 

(ii)  Suppose  e1=e2  wrt  w.  Then  for  all  states  st,  w 
(el— e2)  ♦  (e2— el)  (st)  .  So  if  w*  6  El  (st)  ,  then  w*  € 

(el— e 2) * (e2— el )  (st) ,  w*  %  R  (st)  and  w'*w-  Therefore  w*  is  also 
in  E1(ts2(st)).  Now  suppose  el£e2  wrt  w.  This  means  that  for 
seme  st,  w  6  (e  1— e2)  +  (e2— e  1)  (st)  .  We  may  assume  that  R(st)=0. 
Then  w  €  El  (st)  ,  but  w  ^  El(ts2(st)).  Hence  El  (st)  is  not  a 
subset  of  El(ts2(st)). 

0 

Using  this  last  theorem,  we  can  show  the  unsolvability  of  two 
properties  of  Theorem  7.1.  This  fellows  from  the  undecidability 
of  the  equivalence  of  expressions. 

As  we  have  done  in  the  past,  we  will  repeat  the  guestion: 

"Are  these  formulas  complete  with  the  set  difference  operator 
omitted?"  They  are  not  complete,  and  the  following  example  will 
illustrate  this: 

s:  R  ( 2)  ,  S  ( 2)  ,  B:  1— >2,  S:  1->2 
e:  (RQS)  [  1=3  ] 

ts  :  insert  (R,  (0,  1)  )  ;  insert  (S,  (0,  1)  ) 


We  calculate: 


(IV, 7)  195 


Noins (E)  =  false 
Noins  (S)  =  false 
Iok  (BBS)  =  false 
Iok  (e)  =  false 
Ins  (e)  -  {(0, 1,0,1)} 

Yet  we  can  show  that  for  every  state  st  such  that  ts(st)  is  a 
state,  e  (ts  (st) ) -e  (st )  C  Ins(e):  Suppose  w  €  e  (ts (st) ) -e  (st)  . 
Then  either  w[1,2]  6  B  (ts  (st)  ) -E  (st)  or  w[3,4]  €  S  (ts  (st)  )-S  (st)  . 
If  the  first  case  holds,  then  w[  1 ,2  ]=  (0, 1 )  .  Now  w[3]=w[1}=0  and 
w[3,4]  €  S( ts  (st)  )  ,  and  therefore  w[  4  ]=  1  because  we  know  that 
(0,1)  €  S(ts(st)),  S: 1— >2  and  that  this  FD  is  satisfied  in 
S(ts(st)).  Hence  w  =  (0, 1,0,1)  6  Ins  (e) .  Thus  we  calculated 
Iok  (e)  =false  but  (Vst )  e  (ts  (st)  )— e  (st )  C  Ins(e)  is  true™  By 
rewriting  the  expression  e,  we  can  gain  some  understanding  of  why 
this  occurs.  The  expression  e  can  be  written  as  a  join:  E[1=1]S. 
The  FDs  on  B  and  S  mean  that  each  of  the  join  terms  is  a  key  for 
its  relation.  Since  the  join  terms  of  E  and  S  are  keys,  the  fact 
that  ts  does  not  cause  any  constraint  violation  means  that  there 
was  uo  tuple  in  B(st)  or  S(st)  wi th  a  *0*  in  domain  1. 

Therefore,  no  connection  traps  could  occur. 

We  see,  then,  that  extra  properties  can  be  inferred  from  the 
join  operator  which  are  not  apparent  when  only  cross  product  (in 
addition  to  restriction)  are  available.  In  the  next  few 
paragraphs,  we  will  discuss  the  rules  which  could  be  incorporated 
into  the  predicates  Noins,  Nodel,  Iok  and  Dok  if  join  were  a 
primitive  relational  algebra  operator. 

For  Noins,  we  can  see  that  Noins  (e 1£ X=Y je2)  should  be  true 
when  Noins(el)  and  Noins(e2)  are  true.  This  rule  is  the  same  for 
cross  product.  But  in  addition,  we  can'  observe  that  nothing  will 
be  inserted  into  e1[X=Y]e2  if  whatever  is  inserted  into  el  only 
can  connect  with  tuples  which  are  deleted  from  e2 ,  and  vice 
versa ,  provided  that  connection  traps  do  not  occur. 

For  Nodel,  the  rule  will  be  identical  to  the  one  for  cross 
product™  Deletions  can  not  be  •’cancelled"  as  insertions  can  be 
for  Noins-. 


(IV, 7)  196 


For  Iok,  we  can  generalize  from  the  previous  example  to  obtain 
the  rule  that  Iok  (e  1[  X=Y ]e2 )  is  true  if  X  is  a  key  of  el  (written 
X->e1)  ,  if  Y  is  a  key  of  e2,  if  Iok  (el)  and  Ick(e2)  are  both 
true,  and  if  Ins  (e  1)  [  X  ]=lns  (e  2)  [  Y  ]-  We  need  Iok(el)  and  Iok(e2) 
true  as  the  following  example  shows: 

s:  R(2)  ,  S  (2)  ,  T(2) 

R: 1— >2  ,  S: 1— >2  ,  T : 1— >2 
e:  (R[2=1  ]S)[  1  =  1  ]T 

ts  :  insert  (R,  (0,1)  )  ;  insert  (S ,  (1,2)  )  ;  insert  (T,  (0,3)  ) 

We  calculate: 

1->  (R[  2=1  ]S)  =  true 

( 1 — >T )  =  true 
In  s  (R)  =  {(0,1)} 

Ins  (S)  =  {(1,2)} 

Ins  (T)  =  {(0,3)} 

In s  (R[  2=  1  ]S)  =  {(0,1, 1,2)} 

(Ins(T)  [  1  ]=Ins(R[  2=1  ]S){  1  J )  =  true 
Ins  (e)  -  {(0,1, 1,2, 0,3)} 

Iok(R[2=1]S)  =  false  (since  2/>R) 

Iok  (T)  =  true 

We  see  that  every  reguirement  for  Iok (e)  to  be  true  is  satisfied 
except  Iok (R[ 2= 1  ]S) =true.  But  this  is  enough  for  spurious 
insertions  to  occur.  Take  the  state  st  where 

B(st)  =  {(4,1)} 
s (st)  =  0 
(RC  2=1  ]S)  (st)  =  0 
T  (st )  =  {(4,5)} 
e(st)  =  0 

Then  for  ts  (st)  we  have: 

R(ts(st))  =  {(0,1)  ,(4,1)} 

S  ( ts  (st)  )  =  {1,2)} 

(R  [  2=  1  ]S )  (ts  (st)  )  =  {(0,  1,  1,2)  ,  (4,  1,  1,  2)  } 

T  ( ts  (st)  )  =  {(0,3)  ,  (4,5)} 

e  ( ts  (s  t)  )  =  {(0,1, 2, 2, 0,3)  ,(4,1, 1,2, 4, 5)} 

Thus  a  tuple  bas  been  inserted  which  was  not  in  Ins(e) . 


(IV, 7)  197 


He  also  need  the  clause  Ins (el )  [  X]=Ins(e2) [ Y  ]  in  the  rule  for 
join  because  everything  appearing  in  Ins  (el)  and  Ins  (e2)  must 
appear  in  Ins  (e1[ X=Y ]e2)  =  Ins  (el )  [  X=Y  ]Ins  (e2)  .  The  next  example 
illustrates  this: 

s:  £ (2) ,  S (2)  ,  R: 1->2 ,  S: 1->2 
e:  R[ 1=1  ]S 

ts;  insert  (E ,  (0,1)  )  ;insert  (E ,  (2,  3)  )  ;insert  (S,  (0,  4)  ) 

We  calculate: 

Ins(E)  =  {(0,1)  ,(2,3)} 

Ins(S)  =  {(0,4)} 

(Ins(R)  [  1  ]=Ins(S)  [  1  ])  =  false 
Iok(R)  =  true 
Iok(S)  =  true 
( 1— >R)  =  true 
( 1  —  >S )  =  true 
Ins  (e)  =  {(0,1, 0,4)} 

Here  every  requirement  for  Iok  (e)  to  be  true  is  satisfied  except 
Ins  (R) [  1  ]=Ins  (S)  [  1  ]<.  Again,  this  is  enough  for  spurious 
insertions  to  occur.  The  following  state  has  one: 

R(st)  =  0 
S(st)  =  {(2,5)} 
e  (st)  =  p* 

R  (ts  (st)  )  =  {(0,1)  ,  (2,3)} 

S  ( ts  (st)  )  =  {(0,4)  ,(2,5)} 
e  (ts  (st) )  =  {/ 0,1, 0,4)  ,(2,3, 2, 5)} 

Thus  a  tuple  has  been  inserted  which  was  not  in  Ins(e). 

For  Dok,  the  rule  for  the  join  operator  should  be  the  same  as 
the  one  for  cross  product.  Functional  dependencies  are  not 
relevant  as  they  were  for  Iok..  This  is  because  whenever  a  tuple 
w  is  calculated  as  being  deleted  from  one  component  of  a  join. 

Eel  will  calculate  every  possible  tuple  wBw*  as  being  deleted 
from  the  join. 

He  will  record  these  new  rules  for  join  in  the  following  theorem: 


(IV, 7)  198 

Theorem  7.4-  Let  s  6  RSch;  let  el,  e2  €  Exp  (s)  ,  and  let  ts  € 

Ts  (s)  .  Consider  the  following  definitions: 

Noins  (e1[  X=Y  ]e2)  = 

Noins(el)  •  Noins(e2)  ♦ 

Noins  (el )  •  Iok(e2)  •  Ins(e2[Y])  C  ##Del(e1[X])  ♦ 
Noins  (e2 )  •  Iok  (el)  •  Ins(e1[X])  C  ##Del  (e2f.  X])  ♦ 
Iok(el)  •  Iok  (e2)  •  Ins(e1[X])  C  #  #Del  (e2[  Y  ])  • 

Ins  (e2£  Y  ])  C  ##Del  (e  1[  X  ]) 

Nodel  (e1£ X=Y  ]e2)  =  Nodel(el)  •  Nodel(e2) 

Iok (e  1[X=Y  ]e2)  =  Iok(el)  •  Iok(e2)  •  X->e1  •  Y->e2  • 

Ins (e  1)  [  X]  =  Ins  (e 2)  [  Y  ] 

Eok  (e  1[  X=Y]e2)  =  Dok  (el )  •  Dok(e2)  + 

Nodel(el)  •  Dok(e2)  -*■ 

Dok(el)  •  Nodel  (e2) 

Then  for  every  state  st  such  that  ts{st)  8  RSt(s)  we  have, 
letting  e  =  e1[X=Y]e2: 

(i)  Noins  (e)  =>  e(ts(st))  C  e(st), 

(ii)  Nodel(e)  =>  e(st)  C  e(ts(st)) 

(iii)  Iok(e)  =  e  (ts  (st) )  — e  (st)  C  Ins(e) 

(iv)  Dok(e)  =>  e  (st)  — e  (ts  (st)  )  C  ##Del(e) 

Proof.  (i)  Suppose  Noins  (e)  by  the  truth  of  the  first  clause. 
Then  we  prove  e(ts(st))  C  e  (st)  as  with  cross  product. 

Now  suppose  the  second  clause  holds.  We  will  show  e(ts(st))  C 
e(st)  by  contradiction.  So  suppose  w  6 
e1[X=Y]e2  (ts  (st)  )-e1[  X=Y]e2  (st)  .  Then  either  pl(w)  € 
el  (ts  (st)  )— el  (st)  or  p2  (w)  €  e2  (ts  (st) ) -e  2  (st)  .  Since  Noins(el) 
is  true,  the  latter  case  must  hold.  Because  Iok(e2)  is  true, 
p2(w)  €  Ins  (e2)  ,  and  so  p2(w)[Y]  €  Ins(e2[Y])  C  # #Del  (e  1[  X ])  . 
Since  pi  (w)  [X  ]=p2  (w)  [  Y  ],  p1(w)[X]  €  e  1[  X  ]  (ts  (st) )  and  so  pi  (w)  6 
e1(ts(st))  contradicting  w  €  e(ts(st)). 

The  third  and  fourth  clauses  are  handled  similarly., 

(ii)  We  prove  the  property  of  Nodel  as  with  cross  product. 

(iii)  Suppose  Iok(e),  and  suppose  w  8  e  (ts  (st)  )  — e (st)  .  Then 
either  pi  (w)  €  el  (ts  (st>  )  — el  (st)  or  p2(w)  6  e.?  (ts  (st)  )  — e2  (st)  . 

In  the  former  case,  since  Iok  (el)  is  true,  pi  (w)  €  Ins  (el). 

Since  Ins(e1)[X]  =  Ins(e2)[Y],  there  is  some  w2  6  Ins  (e2)  with 


(IV, 7)  199 


pi  (w)  [  X]=w2[Y]-  Now  w2  €  e2  (ts  (st)  ),  Y->e2  holds,  and 

w2£  Y]=p1  (w)  £  X  ]=p2  (w)  £  Y].  Therefore  w2=p2(w),  so  w  =  p1(w)fip2(w) 

€  Ins  (el)  [  X=Y  ]Ins  (e2)  =  Ins(e). 

The  case  where  p2  (w)  €  e2  (ts  (st)  )— e2  (st)  is  handled  similarly, 
(iv)  We  prove  this  as  for  cross  product. 


D 

We  have  specified  some  additional  rules  for  the  predicates 
Noins,  Nodel,  Iok  and  Dok„  These  new  rules  may  or  may  not  make 
the  rules  complete,  hut  before  the  completeness  problem  is 
attacked,  the  problem  of  incorporating  the  new  rules  into  the 
formulas  must  be  faced-  It  is  not  sufficient  to  simply  append 
the  rules  to  Definition  5..  10  because  "hidden  joins"  may  still 
occur.  That  is,  the  rules  must  be  made  to  apply  to  expressions 
such  as  (E8S) [ 1=3  ]  which  is  equivalent  to  the  join  R[1=1]S.  The 
traus formations  given  in  Theorem  5.4  will  probably  be  very  useful 
in  solving  this  problem. 

7. 3.  Other  Types  of  Operation  Mappings 

We  recall  from  Definition  2.4  that  a  type  1  mapping  satisfies 
the  commutative  diagram  whether  or  not  any  constraints  are 
violated  in  either  schema  si  or  schema  s2. 

Some  discussion  will  be  given  to  understand  the  meaning  of  the 
type  1  property. 

We  discussed  type  3  mappings  first  because  they  have  a  simple 
intuitive  explanation.  Mappings  of  this  type  correctly  interpret 
view  operations  as  long  as  the  "base"  schema  constraints  are  not 
violated.  Presumably,  a  mapping  processor  is  situated  above  the 
base  schema,  and  so  when  an  operation  mapping  issues  operations 
to  effect  the  interpretation,  any  constraint  violation  will 
terminate  and  role  back  the  mapping  process.  Hence,  correct 
interpretation  is  not  needed  if  the  base  constraints  are 
violated.  Note,  however,  that  if  the  base  schema  is  changed  by  a 
modification  of  the  set  of  constraints,  a  type  3  mapping  may  need 
to  be  rewritten  so  that  it  will  still  interpret  operations 
correctly.  That  is,  an  initial  base  state  on  which  the  mapping 
previously  interpreted  operations  incorrectly,  but  which  was  also 


(IV, 7)  200 


transformed  into  an  illegal  state,  may  after  the  constraint 
modification  result  in  a  legal  state  —  and  the  mapping  will  not 
be  correct  on  this  state-  With  type  1  mappings  this  cannot 
happen.  Since  the  correctness  of  type  1  mappings  does  not  depend 
on  the  final  base  state  being  allowable,  constraint  violations 
cannot  cause  incorrect  operation  interpretations.  Thus  the 
constraints  in  the  base  schema  may  be  modified  in  any  way,  and  a 
correct  type  1  mapping  will  remain  correct. 

When  the  type  1  property  is  applied  to  the  data  model 
EEra=(ESch,EStr  ,RSt  ,RQ)  and  mapping  model  RMm= (RMs, RMg) ,  and  when 
the  three  semantics  for  RQ  are  considered,  we  get  a  set  of 
definitions  paralleling  Definition  6.5.  These  we  give  now: 

Definition  7.2.  Let  s  €  RSch,  e  €  Exp(s)  ,  ts  -<  Ts(s)  and  w  € 
VTup(n).  Six  predicates  are  defined  as  follows  (valuations  are 
implicit) : 

Tl-Str— Ins(ts,e,w)  =  -Vst,  st  e  RSt(s)  =>  e  (ts  (st)  )  =  e(st)  +  {w], 

Tl-Str-Del  (ts,e,w)  =  -Vst,  st  8  RSt  (s)  =>  e  (ts  (st)  )  =  e(st)-##w, 

II— Med— Ins  (ts,e,w)  =  Vst,  st  6  RSt(s)  =>  e(st)+{w}  C  e  (ts  (st)  )  , 

11— Med— Del  {ts,e, w)  =  -Vst,  st  e  RSt(s)  =>  e  (ts  (st)  )  C  e(st)— ##w, 

T1— Wk— Ins  (ts,e  ,w)  =  Vst,  st  6  RSt(s)  =>  w  6  e(ts(st)), 

Tl-Wk— Del  (ts,e,w)  *  Vst,  st  6  RSt  (s)  =>  e  (ts  (st) )  «##w  =  0. 

□ 

Using  the  constructs  Ins,  Del,  N^ins,  Nodel,  Iok  and  Dok,  we 
have  characterized  type  3  mappings.  It  is  reasonable  to  ask  how 
these  constructs  can  be  modified  to  apply  to  type  1  mappings. 

We  need  to  remove  references  in  these  constructs  to 
constraints  which  might  not  be  satisfied  in  the  result  state. 

On  an  expression  e,  we  can  identify  two  types  of  constraints 
which  might  hold:  There  are  constraints  which  have  been  derived 
from  constraints  in  the  schema  and  which  therefore  can  fail  when 
the  schema  constraints  are  violated.  There  are  also  constraints, 
basically  EQs  and  FDs  derived  from  them,  which  are  valid  solely 
because  of  the  relational  operators  used  to  construct  e.  For 
example,  in  R[  1  =  2  ],  the  EQ  1=2  will  always  be  valid  no  matter 
what  state  R  is  in. 

For  Ins  and  Del,  constraints  are  only  used  via  the  function 
Cv.  Thus,  in  order  to  modify  Ins  and  Del  to  calculate  T1— Wk-Ins 


(iv,  7) 


and  T 1— Wk— Del,  we  only  need  to  change  Cv,  The  following 
definition  shows  how  this  is  done. 

Definition  7.3.  Let  s  €  ESch,  e  e  Exp(s)  of  degree  n.  The  set 
Drv1 (e)  of  derivable  constraints  on  e  is  defined  by  induction  as 
follows: 

[  1  ]  Drv 1  (E)  =  0. 

£2]  —  [7]  as  in  Definition  4.12  with  Drv1  replacing  Drv. 

The  set  of  constraint  violators  (with  respect  to  type  1) , 

Cv1 (e) ,  is  defined  as  in  Definition  5.6  with  Drv1  replacing  Drv. 

The  set  Ins1  (ts,e)  is  defined  as  in  Definition  6.7  with  Ins 
replaced  by  Ins1,  Del  by  Del1  and  Cv  by  Cv1. 

The  set  Delicts,®)  is  defined  as  in  Definition  6.9  with  Del1 
replacing  Del,  Ins1  replacing  Ins,  and  all  clauses  the  same 
except  clause  [ 1 ]  which  should  read: 

£1]  Del1  (E)  =  T(ts)  (T  as  in  Definition  5.9  clause  [1]) 

D 

The  soundness  of  these  rules  is  expressed  by  the  following 
theorem  which  corresponds  to  Theorem  6.4: 

Theorem  7.5.  Let  s  e  ESch,  ts  €  Ts  (s)  ,  e  e  Exp  (s)  of  degree  n, 
and  w  e  VTup(n).  Then 

(i)  if  w  6  Ins1(ts,e),  then  T  1— Wk-Ins  (ts,e,  w)  ,  and 

(ii)  if  w  €  Del1(ts,e),  then  T  1— Wk— Del  (ts,e,  w) .. 

Proof.  Similar  to  proof  of  Theorem  6.4. 

0 


201 


(IV, 7)  203 


7.4.  Summary  ajid  Conclusions 

This  chapter  has  continued  the  investigation  of  operation 
mappings  begun  in  Chapter  6.  We  investigated  procedures  for 
determining  the  correctness  of  operation  mappings  with  respect  to 
medium  and  strong  semantics.  We  were  able  to  prove  the  soundness 
of  a  set  of  procedures,  but  we  could  provide  no  completeness 
proof.  We  also  briefly  looked  at  correctness  of  mappings  when 
there  was  no  condition  that  the  underlying  constraints  not  be 
violated. 

Since  it  is  medium  and  strong  semantics  which  are  of  more 
interest  as  view  semantics,  it  is  unfortunate  that  complete  rules 
cannot  be  given  at  this  time. 

The  fact  that  we  could  very  easily  generalize  the  procedures 
for  type  3  mappings,  where  constraints  are  always  assumed  to  be 
satisfied,  to  apn.ly  to  type  1  mappings,  in  which  there  are  no 
such  assumptions,  indicates  the  value  of  our  algebraically 
flavored  approach  to  operation  mapping  correctness.  1670;6;5 


(V,8)  203 


PART  V 


Concluding  B emarks 


CHAPTER  8 


Applications  to  Other  Data  Models 


Ii  Chapter  1  the  point  was  made  several  times  that  it  is 
desirable  to  have  database  systems  which  can  support  several  data 
models.  Yet  Parts  III  and  IV  studied  only  one  particular  data 
model.  This  is  not  a  limitation  on  the  value  of  the  results, 
however,  because  the  following  procedure  is  available  for 
applying  these  results  to  other  data  models:  In  order  to  derive 
properties  of  a  mapping  model  Mm,  say  between  arbitrary  data 
models  Dml  and  Dm2,  define  Dml  and  Dm2  "in  terms  of"  the 
relational  model  RDm  (or  some  extension  of  it)  which  we  have 
studied,  and  define  the  mapping  model  "in  terms  of"  the  mapping 
model  RMm  which  we  have  studied.  Then  problems  involving  Mm,  Dml 
and  Dm2  become  problems  involving  RMm  and  RDm  to  which  we  can 
apply  the  results  already  obtained. 

We  will  give  an  example  in  this  short  cnapter  of  how  a 
hierarchical  data  model  can  he  defined  using  the  relational  model 
of  Parts  III  and  IV.  We  will  also  define  some  mappings  between 
hierarchical  schemas  by  using  the  relational  structure  mappings 
of  the  previous  chapters.  Then  we  will  use  the  algorithms  from 
Part  III  to  show  that  the  hierarchical  mappings  are  consistent. 

8.1.  Hierarchical  Mappings  as  Relational  Mappings 

The  hierarchical  model  we  define  corresponds  to  one  appearing 
in  the  literature[ Dale76, 77  ].  This  model  has  the  following 


(V, 8)  204 


characteristics:  Schemas  are  trees  of  record  types.  Each  record 

type  has  a  key  field  and  a  data  field.  The  data  fiold  only 
depends  on  its  own  record's  key,  not  on  any  of  the  keys  of 
ancestor  records.  For  example,  the  hierarchy: 

De]j>t  (D#,  Loc) 

Student  (S#,Name) 

has  data  fields  not  depending  on  ancestor  keys,  but  the  hierarchy 
Project ( J# , Loc) 

Part  (P#,  Quantity) 

has  a  data  field.  Quantity,  which  depends  on  P#  and  on  J#. 

A  state  corresponding  to  such  a  hierarchical  schema  is  a  set 
of  trees  matching  the  pattern  of  the  schema  and  satisfying  the 
key  constraints. 

We  thus  want  to  formally  define  a  data  model  4-tuple 
HEm= ( HSch ,HStr  ,HSt ,HQ) .  The  set  HSch  of  hierarchical  schemas  is 
defined  by  the  following  syntax: 

hsch  : : =  seg 

seg  ::=  (name  desc) 

desc  : :=  desc  seg  |  0  (empty  string) 

Elements  of  this  set  of  schemas  are  linear  representations  of 
trees.  Some  examples  follow: 

(A)  represents  the  tree  A 

A 

(A  (B) )  represents  the  tree 

A 

(A  (3)  (C)  ( D)  )  represents  the  tree 


( V,  8)  205 


A 


(A  (B  (C)  (D) )  (E  (F  (G)  ) )  )  represents  the  tree  B 

I  I 

rii 

Each  segment  declared  in  a  hierarchy  is  (informally) 
understood  to  have  a  hierarchical  key  and  a  data  field. 

Now  we  want  to  define  HStr  and  HSt.  We  do  not  need  to  define 
hierarchical  structures  and  states  to  be  trees  as  mentioned 
above.  It  is  only  necessary  that  we  derine  something  isomorphic 
to  trees-  Since,  as  we  have  said,  we  are  defining  HDm  in  terms 
of  RDm,  w 3  will  want  HStr  to  be  a  subclass  of  RStr  and  HSt  to  be 
a  subclass  of  RSt.  In  order  to  specify  which  relational 
structures  are  equivalent  to  sets  of  trees,  we  need  to  translate 
hierarchical  schemas  to  relational  schemas  taking  into  account 
the  tree  structures  and  also  the  implicit  key  and  data  field 
structures-  To  do  this,  we  specify  a  translation  j : H Sch— >RSch * - 
(Recall  that  the  prime  ,,,H  indicates  relational  schemas  with 
subset  constraints  as  defined  in  Chapter  5-)  Basically,  j  is 
defined  by  the  process  of  first  nor maliza tion[ Codd70 ].  The 
following  examples  translate  the  example  schemas  given  alove: 

(A)  - j - >  A  (2),  A:  1— >2 

(A  (B)  )  - j— >  A  (2),  B  (3  )  ,  A :  1— >2,  B:2->3,  B£  1  ]  C  A£1] 

(A  (B)  (C)  (D)  )  - j - >  A  (2)  ,  B  (3)  ,  C  (3)  ,  D  (3) 

A: 1— >2,  B :2— >3,  C:2->3,  D:2->3 
B[1]CA[1]»  C[1]CA[1],  D[  1]  C  A[  1] 

(A  (B  (C)  (D)  )  (E  (F  (G)  )  )  )  —  j - > 

A  (2)  ,  B  (3)  ,  C  (4)  ,  D  (4)  ,  E  (3)  ,  F  (4)  ,  G  (  5)  , 

A: 1— >2,  B: 2— >3,  C:3->4,  D: 3— >4  ,  E:2->3,  F:3->4, 

G:  4— >5  ,  B[  1  ]  C  A[1],  C£  1 , 2  ]  C  B[1,2], 

D£  1  ,2  ]  C  B[  1 ,2  ],  E£1]  C  A£1],  F£1,2]  C  E[1,2], 

G[  1,2,3]  C  F[ 1,2,3] 

The  next  two  components  of  HDm  are  now  easily  defined-  Gi**en  s  € 
HSch , 


(V,8)  206 


HStr(s)  =  RStr*  (j  (s)  )  and 
HSt  (s)  =  RSt1  (j  (s)  )  . 

(HQ  will  not  be  defined  here. )  Thus  hierarchical  structures  are 
relational  structures  for  the  normalized  schema,  ard  hierarchical 
states  are  relational  states  for  the  normalized  schema. 


In  the  iiterat  ure|_  Dale  76  ,77  ] ,  there  were  three  kinds  of 
hierarchical  structure  mappings  defined.  They  were  defined  as 
functions  which  means  that  there  was  at  most  one  member  in 
HMs(s1,s2)  given  any  si,  s2  €  HSch.  The  three  types  are  given  by 
the  following  syntax: 


hmap  : lift  name  |  interchange  name  ,  shift  name  to  name 
We  shall  illustrate  their  intuitive  meaning  with  some  examples: 


n 


■lift  B- 


B 

1 

A 

r “h 

C  D 


n 

i 


-interchange  B- 


n 

i 


•shift  E  to  D- 


-> 


A 

I 

D 

I 


n 

4 

In  order  to  define  what  these  mappings  mean,  we  will  associate 
a  member  of  EMs  with  each  one  as  the  following  diagram  indicates, 
where  s1,s2  6  HSch  and  m  €  HMs: 


sl- 


m 


->  j<s1) 


s  2- 


j  (m) 


->j  (s2) 


j  (m)  €  EM  s  ( j  (si)  ,  j  (s2 ) ) 


So  for  each  of  the  above  examples  we  give  the  associated  member 
of  EMs (j (si) , j (s2 )  )  : 


(V,8)  207 


For  lifting,  the  assignment  is: 


(A  (B  (C) )  (D)) 


lift  B 


<E(A(C)  CD)  )  ) 


. j- _ — _ >  rsl 


rm 


V 

->  rs2 


where  rsl  is  A  (2),  B  (3)  ,  C(4),  0(3), 

A: 1— >2,  B: 2— >3 ,  C:3->4,  D:2->3, 

B[  1  ]  C  A[  1],  C[  1,2]  C  B[  1,2],  D[  1]  £  A[  1  ] 

rs2  is  B  ( 2)  ,  A  (3)  ,  C(4),  D  (4)  , 

B:  1— >2,  A : 2— > 3 ,  C:3->4,  D:3->4, 

A[  1  3  ->  B[1],  C[  1 , 2  ]  C  A[  1 , 2  ],  D[  1 , 2  ]  C  A[  1,2] 


and  rm  is 


B  =  B[  2 ,3  ] 

A  =  (A[  1=1  ]B)[  4,1,2] 

C  =  C[  2,  1,  3, 4] 

D  =  (DC  1  =  1  ]B)  [5, 1,2, 3] 


This  says  that  an  A-segment  is  below  a  B— segment  in  the  target 
schema  if  the  B— segment  was  below  the  A— segment  in  the  original 
schema.  A  C-segment  is  below  a  B-A-branch  in  the  target  schema 
if  that  C-segment  is  below  the  equivalent  A— B— branch  originally. 
A  D— segment  is  below  a  B— A— branch  in  the  target  schema  if  the 
D-segment  is  below  the  A-segment  which  is  above  the  B-segment. 

For  interchange,  the  assignment  is: 


<A  (B  (C)  )  <D)) 


->  rsl 


interchange  B 


r  m 


(B  (A  (C)  )  (D)) 


V 

■>  rs2 


where  rsl  is  A  (2),  B  (3 )  ,  C  (4  )  ,  D(3), 

A: 1— >2,  B : 2— > 3 ,  C:3->4,  D:2->3. 


B[1]  C  A[  1  ]  ,  C[  1,2]  C  B£  1 , 2  ],  D[  1  ]  C  A[  1  ] 

rs2  is  B  (2)  ,  A  (3 )  ,  C(4),  D(3), 

B : 1— >2,  A: 2->3,  C:3->4,  D:2->3 

A[  1  ]  C  B[  1  ],  CC  1,2  3  C  A[  1,2],  D[  1  ]  C  B[  1  ] 


(V,8)  208 


and  rm  is  B  =  B[2,3] 

A  =  (A[  1  =  1  ]B)[  4, 1,2] 

C  =  C[ 2, 1,3, 4] 

D  =  (DC  1=1  ]B)[  5r  2, 3] 

This  says  that  a  D— segment  is  under  a  B— segment  in  the  target 
schema  if  those  D—  and  B— segments  were  under  the  same  A-segment 
in  the  original  schema.  The  other  assignments  are  similar  to 
those  in  the  lifting. 

For  shifts,  the  assigment  is: 


(A  (B  ( C)  )  (D  (E)  )  ) 


->  rs  1 


shift  B  to  D 


rm 


>  rs2 


(A  (D  (B  (C)  )  (E)  )) - j  — 

where  rsl  is  A  (2  )  ,  B(3),  C  (4  )  ,  D  (3  ) ,  E  (4)  , 

A: 1— >2,  B : 2— > 3,  C:3->4,  D:2->3,  E:3->4 
B[1]  C  A[  1  ]  ,  C[1,2]  C  B£1,2], 

D[  1  ]  C  A[  1  ],  E£  1,2]  C  D£  1,2  ] 

rs2  is  A  (2),  D  (3  )  ,  B  (4  )  ,  C  (5  )  ,  E  (4)  , 

A : 1— >2,  D : 2->3 ,  B:3->4,  C:4->5,  E:3->4 

D[  1  ]  C  A[  1  ],  B[  1,  2]  C  D(  1,2],  C£ 1,2,3  ]  C  B[  1,2,3], 

Er  1,2  3  C  DC  1,2] 


and  rm  is 


A  =  A 
D  =  D 

B  =  ( B  £  1=1  ]D)[  1,5, 2, 3] 

C  =  (C[ 1  =  1  ]D)[  1,6, 2, 3, 4] 

E  =  E 


This  says  that  a  B— segment  and  its  descendents  are  under  a 
D-segment  if  the  B— subtree  was  under  the  same  A-segment  as  the 
D-segment . 

We  have  defined  (the  static  components  of)  the  data  model  HDm 
and  the  hierarchical  mapping  model  HMm-  We  now  want  to  show  that 
these  structure  mappings  are  always  state  mappings.  To  show  that 
a  mapping  m  e  HMs(s1,s2)  is  consistent,  we  show  that  its  image, 
j  (m)  €  EMs*  ( j  (si)  ,  j  (s  2)  )  is  consistent.  We  will  prove  this  for 


( V,  8)  209 


the  interchange  mapping.  What  we  need  to  do,  as  we  know  frcm 
Chapters  3  and  4,  is  translate  the  constraints  in  rs2  using  the 
mapping  j (interchange  B)  to  FDs  and  SSCs  on  rsl  and  then  derive 
these  translated  constraints  in  rsl. 

The  FDs  we  want  to  derive  in  rsl  are  the  following: 

B£2,3]:1->2 
(A[1=1  ]B)  [4,1,2]:  2— >  3 
C[  2,1,3, 4]:3->4 
(D[  1=  1  ]B)  [  5,  2,  3  ]:  2— >  3 

It  is  not  hard  to  see  that  these  FDs  are  derivable,  respectively, 
from  the  following  FDs  on  base  relations: 

B: 2— >3 
A: 1— >2 
C:  3— >4 
D:  2— >3 

The  next  step  .is  to  prove  that  all  of  the  SSs  in  rs2  are 
derivable  from  those  in  rsl.  After  substituting  in  the  mapping 
expressions,  the  rs2  subset  constraints  appear  as  the  following 
rsl  SSs: 

(1)  (A[1=1]B)[4,1,2][1]  C  B[  2,3][  1] 

(2)  C[2,1,3,4][1,2]  C  (A[1  =  1]B)[  4,1,2][1,2] 

(3)  (D[  1=1  ]B)  [  5 ,2  ,3  ][  1  ]  C  B[  2, 3  ][  1  ] 

We  will  refer  to  the  rules  of  Definitions  5.4  and  5.5  in  the 
derivations  below. 

Proof  of  ( 1)  : 

B:  2=2  (5.4.6) 

E[2]  C  B[  2]  (5.4.18) 

B[2]  C  B[  2,  3  ][  1  ]  (5.5.1c  with  Y=2,  X1=2,3  and  Y'=2) 

(AfiB)  [4]  C  B[2,3][  1  ]  (5.  5.  4d) 

(AC  1=  1  ]B)  [4]  C  B[  2,3][  1  ]  (5.5.2c) 

(A[  1=  1  ]B)  [  4,  1 , 2  ][  1]  C  B[  2 ,3  ][  1] 

(5.5.1b  with  X=  4 ,  X1=5,1,2  and  X*  =  1) 

Proof  of  (2)  : 

C£  1,2]  C  B[1,2]  (schema  constraint) 


(V,8)  210 


C[1,2]C  (AfiB)  £3,4]  (5 , 5.  4d) 

(AfiB)  [  1=3  ]:  1=3  (5.5.2a) 

C[  1  r 2  ]  C  (AC  1  =  1  ]B)  [3,4]  (5  —  5 -  2d) 

(A£  1=  1  ]B)  :  3,4  =  1  ,4  (5.4 .5,5.5.  2a) 

(A[1=1]B)[3,4]  C  (A[  1  =  1  ]B)  £1,4]  (5..4.18) 

C[  1 , 2  ]  C  (A[  1=1  ]B)  [  1, 4]  (5,4„12) 

C£  1 , 2  ]  C  (A[  1=1  ]B)  [  4,  1f  2  ][  2,1  ]  (5.5.1c  with  Y=1,2,  X1  =  4,1,2 

and  Y • =2 , 1 ) 

C[  2 , 1  ]  C  (A[  1=1  ]B)  £  4,  1,  2][  1,2]  (5.4.13  with  Z=  2 , 1 ) 

C[  2,1  ,3,4][  2,  1  ]  C  (A£  1=1]  B)  [4, 1,2  ]£  1,2]  (5.  5.  1b  with  X=  2, 1, 

X 1=  2,  1,3,  4  and  X'  =  1,2) 

Proof  of  (3) : 

B:  2=2  (5.4.6) 

BE  2]  C  B[  2]  (5 . 4  -  1  e) 

B[  2  ]  C  B[2,3][1]  (5.5.1c  with  Y=  2,  X1=2,3  and  Y»=2) 

(DQB )  [  5 ]  C  B[2,3][1]  (5.5.4c) 

(D£  1  =  1  ]E)  [5  ]  C  B£  2,3  ]£  1  ]  (5.5.2c) 

(D[1=1]B)£5,2,3]£  1  ]  C  BE2,3jC1]  (5.5.1c  with  Y=5,  X1=5,2,3 

and  Y • = 1 ) 

Thus  we  see  that  interchange  B  is  a  state  mapping.  What  we 
nave  done,  of  course,  is  proved  that  one  particular  mapping: 

(A  (B  (C)  )  (D)  )  - interchange  B - >  (B(A(C)J(D)) 

is  consistent.  We  have  not  shown  that  all  interchange  mappings 
have  this  property. 

8.  2.  Summary,  and  Conclusions 

This  chapter  has  given  an  example  of  the  way  in  which  we  can 
apply  the  previous  results  of  this  thesis  to  a  theoretically 
unlimited  number  of  data  models.  We  indicated  how  to  define  a 
hierarchical  model  in  terms  of  our  relational  model,  and  we 
indicated  how  to  define  a  class  of  hierarcnical  mappings  in  terms 
of  our  relational  algebra  mapping  model..  We  say  "indicated" 
because  we  only  specified  the  translations  by  example;  no  closed 
definition  was  given.  A  little  thought  will  show  that  a 
formalism  such  as  denotational  semantics  (e.g.,  E  deBa  ],  [  EeGl  ], 


(V,8)  211 


[EiGN],  [Dona75],  [Dona76],  [Tenn  ])  will  work  very  well  for  this 
purpose.. 

This  chapter  has  shown  the  importance  of  subset  constraints 
for  representing  other  data  models.  By  introducing  even  more 
constraint  types,  and  developing  mapping  rules  for  them,  we  will 
be  able  to  provide  concrete  algorithms  for  more  varied  data 
models.  It  seems  feasible,  then,  that  with  the  proper 
mathematical  foundation  as  described  in  this  thesis,  all  of  the 
desirable  database  facilities  described  in  Chapter  1  can  be 
implemented  in  a  sound  and  controlled  manner. 


<V,9)  212 


CHAPT  E£  9 


Con trib utiors  and  f uture  Work 


In  Part  I  we  argued  that  mappings  will  play  a  prominent  role 
in  advanced  database  management  systems.  It  was  therefore  an 
important  problem  to  study  formalisms  for  mappings,  to 
investigate  the  use  of  these  formalisms  to  define  precisely  what 
properties  of  mappings  are  desirable  and  to  study  methods  of 
determining  when  mappings  have  these  desirable  properties. 

Part  II  was  concerned  with  mathematical  aspects  of  mappings. 
Chapter  2  studied  ways  to  define  mappings  using  abstract  machine 
formalisms.  This  formalism  was  good  for  describing  operation 
mappings,  but  it  could  not  express  query  mappings  well.  Our 
formalism  was  shown  to  be  superior  to  the  only  other  published 
formalism  for  database  mappings.  The  problem  of  parametr izat ion 
of  data  model  operations  was  not  covered  adequately  by  any 
formalism. 

The  fact  that  structure  mappings  have  a  close  relationship  to 
definitional  axioms  in  Predicate  Calculus  was  shown  in  Chapter  3. 
With  this  relationship,  we  derived  some  very  general  requirements 
for  desirable  (consistent)  structure  mappings.  Since  these 
reguirements  involved  proving  theorems  of  Predicate  Calculus,  the 
general  problem  of  determining  consistent  structure  mappings  is 
unsol vable.  We  concluded  that  a  major  goal  in  developing  mapping 
technology  is  to  devise  languages  for  which  algorithms  exist  to 
decide  the  desirable  properties  of  structure  mappings. 

In  Part  III  we  demonstrated  that  algorithms  can  be  written 
which  can  decide  when  relational  structure  mappings  preserve 
functional  dependency  constraints.  We  had  to  eliminate  the  set 
difference  operator,  since,  with  it,  the  problem  is  unsol vable. 
The  basic  relational  model  with  functional  dependency  and 


(V,  9)  213 


equality  constraints  was  extended  to  include  subset  constraints. 
We  showed  that  these  kinds  of  constraints  are  important  for 
modeling  real-world  situations,  that  they  can  be  integrated  with 
the  functional  dependency  and  equality  constraints,  and  that 
there  are  algorithms  for  deciding  when  mappings  preserve  these 
kinds  of  constraints.  We  saw  that  extending  the  algorithm  to 
deal  with  multiple  levels  of  mappings  —  a  desirable  feature  for 
database  management  systems  —  was  a  difficult  task  due  to  the 
necessity  for  keeping  track  cf  non- loss  joins. 

Part  IV  studied  some  of  the  desirable  properties  of  a  simple 
class  of  operation  mappings.  We  established  that  sound 
algorithms  exist  tor  deciding  when  an  operation  mapping  correctly 
interprets  operations.  Again,  the  set  difference  operator  had  to 
be  removed  from  the  structure  mapping  language,  since  with  it  the 
problems  were  undecidable. 

Chapter  8  gave  an  example  showing  that  the  results  for  our 
relational  model  can  be  applied  to  other  models.  This 
demonstration  is  very  important  since  it  greatly  increases  the 
value  of  the  previous  results. 

On  the  whole,  we  can  draw  from  this  work  both  positive  and 
negative  conclusions. 

On  the  positive  side,  we  have  been  able  to  establish  a  useful 
framework  for  defining  mappings  and  their  properties.  We  were 
able  to  give  algorithms  for  a  number  of  the  most  important 
properties  of  mappings. 

Or  the  negative  side,  we  saw  that  undecidabi lity  was  often  a 
problem.  Even  with  the  most  basic  set  of  constraint  types  and 
the  most  basic  kind  of  operation  mappings,  the  power  of  the 
structure  mappings  introduced  undecidability  in  several  places. 

In  addition ,  the  facility  to  support  multiple  levels  of  views 
was  seen  to  involve  some  diff iculties„  To  get  complete  rules  for 
state  mappings  through  several  levels,  complicated  algorithms 
involving  non-loss  joins  are  necessary.  To  get  reasonable 
properties  for  operation  mappings  on  higher  level  views,  the 
strongest  possible  properties  are  needed  on  the  lower  levels. 


(V,9)  214 


There  are  many  directions  for  future  work  which  this  research 
has  suggested..  We  will  briefly  discuss  some  of  these  areas.  The 
problems  have  been  placed  in  three  categories:  practical 
directions,  extensions  and  specific  directions. 

In  the  practical  domain,  the  building  of  a  system  which  makes 
use  of  the  mapping  concepts  presented  here  would  be  valuable. 

Such  a  system  would  have  mapping  processors  as  in  the  ANSI/SPARC 
framework  which  would  incorporate  the  algorithms  of  Parts  III  and 
IV.  The  building  of  a  database  terminal  would  also  be  an 
interesting  project. 

Both  projects  would  require  efficient  algorithms  for  the 
mapping  processors.  Since  this  research  did  not  address  the 
guestion  of  computation  time,  this  is  another  practical  area  for 
work. 

In  several  places  in  this  thesis  opportunities  were  noted  for 
extending  the  constructs  being  discussed: 

In  Chapter  2  we  remarked  that  our  formalism  did  not  model  very 
well  the  complex  parameters  which  occur  in  data  model  operations. 
Refinements  to  the  formalism  are  needed  to  take  into  account  this 
aspect  of  data  models^ 

In  Chapter  3  we  discussed  the  usefulness  of  higher  order 
Predicate  Calculus.  We  would  like  to  incorporate  such  extensions 
into  a  mapping  language  and  extend  the  algorithms  of  Parts  III 
and  IV  to  cover  these  extensions.  The  possibilities  for 
expressing  access  rights  using  "before"  and  "after"  predicates 
should  also  be  investigated. 

In  Chapter  5  we  extended  the  constraint  types  to  include 
subset  constraints.  It  also  would  be  useful  to  have  other  types 
of  constraints  in  addition  to  the  ones  studied.  For  example, 
multi-valued  dependencies{ BeF H]  would  be  useful  to  have,  and 
their  interactions  within  single  relations  has  already  been 
axiomatized. 

In  Part  IV  we  studied  operations  which  were  simple  insertions 
or  deletions.  Update  operations  which  modify  existing  tuples 
should  also  be  studied.  We  also  restricted  the  operation 


(V,  9)  215 


mappings  to  simple  sequences  of  operations-  Adding  a  control 
structure  to  operation  mappings,  such  as  if—  or  while— statements, 
would  allow  us  to  "model"  more  data  models. 

One  of  the  ‘'facts  of  life"  of  database  management  is  that 
things  do  not  stand  still.  In  particular,  schemas  in  a  database 
system  often  change  to  reflect  changing  priorities  or  usage 
patterns.  Since  there  is  often  a  large  investment  in 
applications  which  are  tied  to  particular  schemas,  it  is 
important  to  be  able  to  control  such  changes.  Within  the 
framework  provided  by  mappings,  we  can  study  this  important 
problem. 

In  real  systems  it  is  often  impossible  to  know  the  value  of 
all  attributes  of  every  entity.  This  problem  is  usually  dealt 
with  by  having  special  "null"  data  values  (e. g. ,  [ANSI75], 
[Codd75],  [InAl])-  The  way  in  which  such  null  values  affect 
mappings  needs  to  be  studied. 

As  we  saw  in  Chapter  8,  denotational  semantics  will  be  a 
useful  tool  with  which  to  define  arbitrary  data  models  in  terms 
of  one  "canonical"  data  model.  Some  actual  cases  should  be 
worked  out  in  detail,  and  the  general  requirements  on  the  target 
data  model  and  on  the  generic  functions  (the  LISP— like  functions 
needed  for  most  definitions[ Dcna75, 76  ])  neeu  to  be  researched. 

At  several  points  in  the  thesis  wt-  also  pointed  out  specific 
problems  which  were  left  open: 

At  the  end  of  Chapter  2  we  mentioned  some  global  completeness 
problems.  A  specific  instance  is  the  following:  Let  si,  s2  € 
ESch  and  let  m  €  EMs(s1,s2)  be  a  state  mapping.  Given  an 
operation  g  6  EQ(s2)f  such  that  there  is  some  (set-theoretic) 
function  f : RStr  (s  1) ->EStr  (s  1)  which  correctly  interprets  q,  is 
there  some  element  of  RQ(sl)*  which  also  correctly  interprets  g? 

We  introduced  subset  constraints  in  Chapter  5,  and  we  showed 
that  the  given  rules  were  sound.  It  would  be  of  value  to  show 
that  the  rules  were  also  complete.  Of  course,  this  problem  is 
undecidable  when  set  difference  is  included  in  relational  algebra 
since  e1^e2  if  and  only  if  el  C  e2  and  e2  C  el. 


(V, 9)  216 


Additional  rules  for  structure  mappings  were  introduced  to 
take  into  account  non-loss  joins-  Tnese  rules  can  be  used  for 
mappings  between  several  levels  of  views-  We  indicated  that 
these  rules  were  sound,  but  completeness  proofs,  if  completeness 
holds,  are  also  needed. 

We  removed  un decidabil it y  by  omitting  the  set  difference 
operator-  Another  way  to  get  completeness  results  is  to  not 
allow  the  same  base  relation  to  appear  more  than  once  in  an 
expression.  Can  we  define  less  restrictive  (and  therefore,  more 
useful)  subsets  of  relational  algebra  for  which  complete  rules 
can  be  given  for  the  properties  we  studied? 

For  operation  mappings,  we  focused  our  study  on  the  type  3 
property  which  said  that  the  mapping  correctly  interprets 
operations  as  long  as  the  constraints  on  the  base  schema  are  not 
violated-  The  type  1  property  was  briefly  looked  at.  This  is 
the  property  of  correct  interpretation  regardless  of  constraint 
violations.  The  type  2  property  should  be  investigated  along 
with  the  properties  of  consistency  and  tr— consistenc y„ 

For  the  medium  and  strong  type  3  properties,  we  defined 
recursive  predicates  Noins,  Nodel,  Iok  and  Dok.  We  should  try  to 
prove  that  with  certain  restrictions,  these  predicates  completely 
characterize  the  intended  properties. 

The  area  of  database  mappings  offers  many  opportunities  for 
theoretical  and  practical  research. 


(V ,  R)  217 


References 


[ ABCE }  Astrahan  M.M.  ,  Blasgen  M.  W,  ,  Chamberlin  D  .D.  ,  Eswaran 
K.  P. ,  Gray  J.N. ,  Griffiths  ?. P. ,  King  W, F . ,  Lorie  R. A. , 

McJones  P.R.,  Mehl  J.W.,  Putzolu  G.R..  ,  Traiger  I.L. ,  Wade  B.W. 
and  Watson  V.  ''System  R:  Relational  Approach  to  Database 
Management"  ACM-TODS  1,  2,  pp. 97-137  (1976) 

[AhBU]  Aho  A .  V.  ,  Beeri  C.  and  Ullman  J, D.  "The  Theory  of  Joins 
in  Relational  Databases"  Proc-  18th  IEEE  Symp.  on  Foundations 
of  Computer  Science,  1977 

[ AhSU  ]  Aho  A.V„,  Sagiv  Y.  and  Ullman  J.D.  "Efficient 

Optimization  of  a  Class  of  Relational  Expressions",  to  appear, 
ACM-TODS 

[ANSI75]  ANSI/X 3/SPARC  Study  Group  on  Data  Base  Management 
Systems  Interim  Report,  ACM  SIGMOD  FDT  7,  2  (1975) 

[ANSI  77]  "The  AN  SI/X3/SPARC  Framework:  Report  of  the  Study 
Group  on  Data  Base  Management  Systems"  D.  Tsichritzis  and  A. 
Klug  (eds.) ,  AFIPS  Press 

[Arms]  Armstrong  W-W.  "Dependency  Structures  of  Data  Base 

Relationships"  Information  Processing  74,  North  Holland  Pub„ 
Co. , 1974 

[ deBa  ]  de  Bakker  J„ W»  "Semantics  and  the  Foundations  of  Program 
Proving"  Proc..  IFIP-77,  North  Holland  Pub.  Co„ 

[BeFH  ]  Beeri  C.,  ,  Fagin  R.  and  Howard  J«H.,  "A  Complete 

Axiomatization  for  Functional  and  Multivalued  Dependencies  in 
Database  Relations"  Prcc.  ACM- SIGMOD  Conf.  1977 

[Bern75]  Bernstein  P.A.  "Normalization  and  Functional 

Dependencies  in  the  Relational  Data  Base  Model"  T.R.  CSRG-60, 
Computer  Systems  Research  Group,  University  of  Toronto,  1975 


(V,R)  218 


[Bern76]  Bernstein  P„A.  "Synthesizing  Third  Normal  Form 

Relations  from  Functional  Dependencies"  ACM  TODS,  J,  pp.  27  1- 
298,  1  976 

[BeSl]  Bell  J.L.  and  Slomson  A.B,  "Models  and  Ultra  prod ucts :  An 
Introduction"  North  Holland  Pub-  Co.,  1971 

[EiGl]  Biller  H-  and  Glatthaar  W-  "On  the  Semantics  of  Data 
Bases:  the  Semantics  of  Data  Definition  Languages"  Computer 
Science  Lecture  Notes,  Springer-Verlag 

[ BiGN  ]  Biller  H. ,  Glatth  aar  W.  and  Neuhold  E.J.  "On  the 

Semantics  of  Data  Bases:  the  Semantics  of  Data  Manipulation 
Languages" 

[ Erod  ]  Brodie  M.  "Specification  and  Verification  of  Data  Base 
Semantic  Integrity"  Ph.D.  Thesis,  University  of  Toronto,  1978 

[ BrPP  ]  Bracchi  G. ,  Paolini  P-  and  Pelagatti  G.  "Binary  Logical 
Associations  in  Data  Modelling"  Modelling  in  Data  Base 
Management  Systems,  G.M.  Nijssen  (ed.),  North  Holland  Pub.  Co. 
1976 

[ChGT  ]  Chamberlin  D.  D- ,  Gray  J.N..  and  Traiger  I.L.  "Views, 
Authorization,  and  Locking  in  a  Relational  Data  Base  System" 
AFIPS  Conference  Proceedings,  44,  1975 

[Coda]  Codasyl  Data  Base  Task  Group,  April  1971  report,  ACM,  New 
York 

[Codd70]  Codd  E.F.  "A  Relational  Model  of  Data  for  Large  Shared 
Data  Banks"  CACM,  1_3,  pp-377-387,  1970 

[Codd72a]  Codd  E.F.  "Relational  Completeness  of  Data  Base 
Sublanguages"  Data  Base  Systems,  R.  Rustin  (ed.) ,  Prentice 
Hall,  1972 

[Codd72b]  Codd  E.F.  "Further  Normalization  of  the  Data  Base 

Relational  Model"  Data  Base  Systems,  B„  Rustin  (ed«),  Prentice 
Hall,  1972 

[Codd74  ]  Codd  E.F.  "Recent  Investigations  in  Relational  Data 
Base  Systems"  Information  Processing  74,  North  Holland  Pub. 
Co.,  1974 


(V,R)  219 


[ Codd 75  ]  Codd  E.F.  "Understanding  Relations"  FDT  7,  pp-3-4, 

1975 

[DaBe]  Dayal  0,  and  Eernstein  P. A-  "On  the  Updatability  of 
Relational  Views"  Fourth  International  Conference  on  Very 
Large  Data  Bases,  Berlin,  1978 

[Dale76]  Dale  A.G.  and  Dale  N.B.  "Schema  and  Occurrence 

Structure  Transformations  in  Hierarchical  Systems"  Proc  ACM- 
SIGMOP  1976 

[Dale77]  Dale  A- G«  and  Dale  N..B.  "Main  Schema-External  Schema 
Interaction  in  Hierarchically  Organized  Data  Bases"  Proc.  ACM- 
SIGMOD  1977 

[Dona75]  Donahue  J»2.  “A  Child* s  Guide  to  Scottery"  CSRG  TN-1, 
University  of  Toronto 

[Dona  76]  Donahue  J.E.  "Complementary  Definitions  of  Programming 
Language  Semantics"  Lecture  Notes  in  Computer  Science,  Vol. 

42,  Springer- Verlag ,  1976 

[  EGLT  ]  Eswaran  K.P.,  Gray  J.  N.  ,  Lorie  R.  A.  and  Traiger  I.L. 

"The  Notions  of  Consistency  and  Predicate  Locke  in  a  Database 
System"  CACM  1.9,  11,  pp. 624-633 

[EpSW]  Epstein  R-,  Stonebraker  M.  and  Wong  E.  "Distributed 

Query  Processing  in  a  Relational  Data  Base  System"  Proc.  ACM- 
SIGMOD  Conf.,  1978 

[Fagn]  Fagin  R.  "Multivalued  Dependencies  and  a  New  Normal  Form 
for  Relational  Databases"  ACM-TODS  2,  3,  pp. 262-278,  1977 

[Falk]  Falkenberg  E.  "S ignif ica tions:  The  Key  to  Unify  Data 
Management"  Information  Systems,  2,  pp„ 19-28,  1977 

[Fehd  ]  Fehder  P..  L.„  "HQL :  A  Set-Oriented  Transaction  Language 
for  Hierarchically-S tructured  Data  Bases"  ACM  *74,  Proceedings 
of  the  Annual  Conference,  1974 

[Fgin  ]  Fagin  R.  "Functional  Dependencies  in  a  Relational  Data 
Base  and  Propositional  Logic"  IBM  Journal  of  Research  and 
Development  2J,  pp~  53  4-54  4 


(V,R)  220 


[  FuSe  ]  Furtado  A»L^.  and  Sevcik  K.C.  "  Permit  ting  Updates  Through 
Views  of  Data  Bases",  P.  U.C.e.  Tech.  Rep.,  Dec.  1977 

[GuSh]  Joint  Guide  and  Share  Database  Requirements  Group 

"Requirements  for  a  Database  Management  System",  November  1970 

[Guttj  Guttag  J. V.  "Abstract  Data  Types  and  the  Development  of 
Data  Structures"  CACM,  20,  pp. 396-404,  1977 

[ Henk  ]  Henkin  L.  "The  Completeness  of  the  First-Order 

Functional  Calculus"  J »  Symbolic  Logic,  1_4,  pp..159-166  ,  1949 

[Hewi]  Hewitt  C.  "Description  and  Theoretical  Analysis  (using 
schemata)  of  PLANNER:  A  Language  for  Proving  Theorems  and 
Manipulating  Models  in  a  Robot"  Ph.D.  Thesis,  Dept.  of 
Mathematics,  MIT,  1972 

[InAl]  "An  Information  Algebra"  Phase  I  Report  --  Language 

Structure  Group  of  the  CODASYL  Development  Committee,  CACM  5, 
pp.  190-204  (1962) 

[ KeKT  ]  Kerschberg  L.  ,  Klug  A.,  and  Tsichritzis  D.  "A  Taxonomy  of 
Data  Models"  Systems  for  Large  Data  Bases,  Lockemann  &  Neuhold 
(eds.).  North  Holland  Pub..  Co.,  1976 

[Kent]  Kent  W.  "New  Criteria  for  the  Conceptual  Model"  Systems 
for  Large  Data  Bases,  Lockemann  S  Neuhold  (eds-).  North 
Holland  Pub.  Co.,  1976 

[Klee]  Kleene  S.C.  "Introduction  to  Metamathematics"  D.  Van 
Nostrand  Co.  Inc.,  Princeton,  19  50 

[KITs]  Klug  A.  and  Tsichritzis  D.  "Multiple  View  Support  within 
the  ANSI/SPARC  Framework"  VLDB-3 

[LoTs]  Lochovsky  F.  and  Tsichritzis  D.  "An  Educational  Data 
Base  Management  System"  INFOR  J_4,  3,  pp.  270-278 

[Mink]  Minker  J.  "Performing  Inferences  over  Relational  Data 
Bases"  Proc.  ACM-SIGMOD  Conf. ,  1975 

[Nijs76]  Nijssen  G„ M„  "A  Gross  Architecture  for  the  Next 

Generation  Database  Management  Systems"  Modelling  in  Data  Base 
Management  Systems,  G. M.  Nijssen  (ed.) ,  North  Holland  Pub. 

Co.,  1976 


(V,R)  221 


[Nijs77]  Nijssen  G.  M.  "On  the  Gross  Architecture  for  the  Next 
Generation  Database  Management  Systems"  Information  Processing 
77,  North  Holland  Pub.  Co-,  1977 

[Nssn76]  Nijssen  G-M„  (ed. )  Modelling  in  Data  Base  Management 
Systems,  North-Holland,  1976 

[Nssn77]  Nijssen  G- M-  (ed.)  Architecture  and  Models  in  Data  Base 
Management  Systems,  North-Holland,  1977 

[PaPe]  Paolini  P-  and  Pelagatti  G.  "Formal  Definition  of 
Mappings  in  a  Data  Base"  Proc. AC M-S IGMOD  Conf.  1977 

[ PePB  ]  Pelagatti  G. ,  Paolini  P.  and  Bracchi  G.  "Mappings  in 
Database  Systems"  Information  Processing  77,  North  Holland 
Pub.  Co.,  1977 

[ RoLe  ]  Robinson  L.  and  Levitt  K. N.  "Proof  Techniques  for 
Hierarchically  Structured  Programs"  CACM  20,  pp.  271-283 

[Schd  ]  Schmid  H.,A.  "An  Analysis  of  Some  Constructs  for 
Conceptual  Models"  (to  appear.  Information  Systems) 

[Schm]  Schmidt  J-W-  "Some  High-Level  Constructs  for  Data  of 
Type  Relation"  ACM  TODS  2,  3,  pp.  247-261 

[SDTG  ]  Stored-Data  Definition  Task  Group,  "Stored-data 
Description  and  Data  Translation:  A  Model  and  Language" 
Information  Systems,  2,  #2,  1977 

[SWKH ]  Stonebraker,  M.  Kong  E. ,  Kreps  P.  and  Held  G.  "The 

Design  and  Implementation  of  INGRES"  ACM-TODS  1_,  pp.  1  89-2  22 

[Shoe]  Shoenfieiu  J.P.  "Mathematical  Logic",  Addison-We sley , 
Reading-London ,  1967 

[SHTG]  Shu  IU  C.. ,  Housel  B.  C.  ,  Taylor  R.W.,  Ghosh  S„P«  and  Lum 
V.  Y.  "  EXPRLSS :  A  Data  Extraction,  Processing  and  P.  EStr  uc  turing 
System"  ACM  TODS  2,  2,  pp.  134-174 

[Solo]  Solomon  M»  "Undecidability  of  the  Equivalence  Problem 
for  Relational  Expressions"  Bell  Laboratories,  to  appear 


(V,P.)  222 


[  Stee  ]  Steel  T.  B„  Jr.  "Formalization  of  Conceptual  Schemas" 

IFIP  TC-2  Working  Conf.  on  Modelling  in  Data  Base  Management 
Systems,  1976 

[Ston  ]  Stonebraker  M„  "Implementation  of  Integrity  Constraints 
and  Views  by  Query  Modifications"  Proc.  ACM-SIGMOD  Conf. 19 75 

[Su]  Su  S.Y.W.  "Application  Program  Conversion  Due  to  Database 
Changes"  Proc.  2nd  International  Conf.  on  Very  Larne  Databases 
1976,  po. 143-157 

[SuLi]  Su  S.Y.W.  and  Liu  B.J.  "A  Methodology  of  Application 
Program  Analysis  and  Conversion  Based  on  Database  Semantics" 
Proc.  ACM-SIGMOD  Conf.  1977 

[ lenn  ]  Tennent  R„D„  "The  Denotational  Semantics  of  Programming 
Languages"  CACM  19,  8,  pp.  437-453  (1976) 

[Todd  ]  Todd  S.  "Automatic  Constraint  Maintenance  and  Updating 
Defined  Relations"  Information  Processing  77,  North  Holland 
Pub.  Co.,  1977 

[Tsic]  Tsichritzis  D-.  "LSL:  A  Link  and  Selector  Language" 

Proc-  ACM-SIGMOD  Conf. ,  1976 

[ Vass  ]  Vassiliou  Y.  "Universal  Terminal"  private  communication. 
University  of  Toronto.-  1977 

[ WoMy  ]  Wong  H.K. T.  and  Mylopoulos  J.  "Two  Views  of  Data 

Semantics:  A  Survey  of  Data  Models  in  Artificial  Intelligence 
and  Database  Management"  INFOE,  J.5,  pp.  344-383,  1977 


( V,  A)  223 


APPENDIX 


Proo fs  of  Selected  Theorems 


Some  of  the  longer  proofs  of  Parts  III  and  xV  appear  (without 
discussion)  in  this  appendix* 

Theorem  4*2.  Let  s  6  KSch,  e  €  Exp  (s)  ,  and  consider  the 
following  property  P  with  respect  to  sets  S  and  SI  of  constraints 
on  e: 


If  c  is  an  EQ,  then  c  €  S  if  and  only  if  c  SI,  and 
id:Z->A  €  S  if  and  only  if  for  some  Z1  C  Z,  Z1->A  €  SI. 

Then  Drv(e)  and  Drvl  (e)  have  property  P. 

Proof.  Clearly,  if  id:Z1— >A  €  Drvl  (e)  and  Z1  C  Z,  then  id:Z1— >A 
€  Drv  (e)  ,  and  by  rule  [10],  id:  Z— >A  €  Drv(e)  .  Also,  if  c  is  an 
EQ  and  c  6  Drvl(e),  then  it  is  clear  that  c  €  Drv(c-).  This 
proves  the  "if"  part. 

We  next  show  that  if  S  and  SI  have  property  P,  then  so  do 

Cl (S)  and  C11  (SI) . 

Clauses  [1]  through  [ 8  ]  in  Definition  4.12  are  no  problem. 
Suppose  id  1 :  Z— >A1,  id.2:Z— >A2  6  C1(S)  and  Edc  (idl )  =Edc  (id2)  . 
Then,  by  induction,  there  are  sets  Z1  C  Z,  Z2  C  Z  such  that 
id  1:  Z  1— >A1 ,  id  2:  Z2— >A2  €  Cll(Sl).  However,  we  know  that  7. 1  = 
leaves  (idl)  =  lea ves  (Edc  (id  1 )  )  =  leaves  (Edc  (id2) )  =  leaves(id2)  = 
Z2.  Hence  id1:Z1->A1,  id2:Z1->$2  €  C11(S1)  and 
Edc  (id  1)  =Edc  (id2)  ,  so  A1  =  A2  €  C J  1  (SI)  - 

For  clause  [11],  suppose  id1:Z— >A,  id2:A+X— >B  €  Cl  (S) .  By 
induction,  we  may  assume  there  are  sets  Z1  C  Z,  Z2  €  A+X  such 
that  idl: Z 1— >A  and  id2:Z2->B  €  C11(S1).  If  A  €  Z2,  then 
id3:Z  1+  (Z2-A)  ->B  €  C11(S1)  and  Z1+(Z2-A)  C  Z+ X..  If  A  f  Z2,  then 
id2  does  not  have  any  leaf  laDelled  A,  and  id3  eguals  id2.  Thus 
we  have  Z2  C  X  C  Z+X  and  id3:Z2->E  e  C11(S1). 


(V,A)  224 


Now  we  prove  the  statements  of  the  theorem  by  induction  on  e. 
When  e  is  a  base  relation,  the  proof  is  clear  from  the  above 
arguments. 

Consider  Drv(e[X]).  If  id2:Z->A  e  Drv(e[X]),  then 
id  1 :  X[  Z  ]— >X[  A  ]  €  Drv(e);  hence  for  some  XI  C  X[Z],  id1:X1->X[A]  € 
Drvl (e) ,  where  id2  is  obtained  from  idl  by  attaching  to  each  leaf 
of  idl  labelled  X[n]  the  arc  n— >  and  to  the  root  X[ A ]  the  arc 
— >A.  We  ray  write  XI  as  X[ Z1  ]  where  Z1  C  Z.  Then  id2:Z1->A  6 
Ervl  ( e[ X  ])  „ 

Now  consider  Drv(e[X=Y])  =  Cl(Drv(e)  ♦  X=Y)  .  By  the  induction 
hypothesis,  Drv  (e)  and  Drvl  (e)  have  property  P.  Therefore 
Drv(e)  +  {X=Y}  and  Drvl (e)  +  {X=Y}  will  also  have  this  property.  By 
what  we  have  shown  above,  Cl(Drv(e)  +  X=Y)  and  Cll  (Drv  1(e)  +  X=Y) 

will  have  property  P,  and  these  sets  are,  of  course,  Drv(e[x=Y]) 
and  Drvl  (e[X=Y  ])  . 

The  cases  for  selection  and  cross  product  are  similar  to 
restriction. 

For  union,  we  have  {x  €  Dr  v  (e  1)  •  Drv  (e2)  :  x  is  an  EQ}  =  {x  € 

Drvl  (el.)  »Drv1  (e2)  :  x  is  an  EQ}  .  If  id1:Z— >A  €  Drv  (el)  and 

id2:Z— >A  €  Drv(e2)  with  Bdc  (idl ;  =  Bdc  ( id 2)  ,  then  there  are  subsets 
Z1  C  Z,  Z2  C  Z  with  id  i : z.  1— >A  €  Drvl  (el)  and  id2:Z2->A  € 

Drvl  (e2)  .  Then  Z1  =  leaves  (idl)  =  leaves  (Bdc  (id  1 ) )  = 
leaves  (F.dc  (id2) )  =  leaves  (id2)  =  Z2.  Hence  id  1 :  Z  1— >A  € 

Ervl  (e  1+e2)  - 

D 

Theorem  4  .  4„  Let  s  6  PSch;  let  e  €  Exp  (s)  ,  and  suppose  idl  and 
id2  are  reduced  indentifiers  which  are  associated  with 
(identifiers  ofy  FDs  in  some  Drvl  set.  Then  id1#id2  implies  that 
there  is  a  state  st  and  a  valuation  x  such  that 
id1(st;x)  *  id2(st;x). 

Proof.  For  this  proof  we  arsume  that  the  schema  contains  no  EQs 
on  base  relations.  A  generalization  to  take  EQs  into  account 
would  not  be  difficult. 

For  each  FD  B:Z->A  derivable  on  each  base  relation  B  in  s 
associate  a  unigue  symbol  *f*.  For  each  integer  node  label  i 
associate  a  unigue  symbol  *vi*.  Consider  the  following  function 
(with  side-effects): 


(V,A)  225 


function  formeval  (id) 

if  id  is  an  integer  leaf  i  then  »vi* 
else  if  id  is  a  literal  leaf  *  v'  then  * v* 

else  if  succ(id)  is  an  integer  node  then  f  or meval  (succ  (id)  ) 
else  /*  succ (id)  is  a  relation  name  R;  root  (id)  is  an 


integer  A,  and  the  set  of  children  of  R  is 
{idl ,idn}  whose  roots  are  labelled  Z1,... ,Zn  */ 
vl  <-  formeval  (idl) ; 


vn  <-  formeval  (idn) ; 

add  the  tuple  t  to  E  (str)  such  that  t[ Zi  ]=vi  and 

if  A  $  {Z1,...,Zn}  then 

t [ A  ]  =  * f (v 1 , . . . f vn)  '  where 

*ff  is  the  symbol  associated  with 

E:Z1.«,.Ze- >A  and  where  t  is  blank  in  other  places, 
return  * f (vl . , vn)  1 


fi  f i  fi  end 

(formeval  produces  what  is  essentially  a  Henkin 

inter pretation[ Henk ].  The  difference  is  that  the  interpretations 
for  symbols  associated  with  different  FDs  of  the  same  relation 
have  been  expanded  with  blanks  and  combined  into  one  set  of 
tuples.) 

We  do  not  apply  the  formeval  function  directly  to  idl  and  id2 
because  it  is  necessary  to  reapply  the  Rdc  operator  to  a 
combination  of  idl  and  id2  in  order  to  get  further  reductions. 
This  we  accomplish  with  the  artifice  of  introducing  a  new 
relation  symbol:  Let  C  be  a  ternary  relation  not  appearing  in 
idl  or  id2  having  an  FD  Q:1,2— >3.  Let  id3  be  the  identifier: 

3 


Then  Rdc(id3)  will  have  the  form: 
3 


4 

I 


idl*  id 2 ’ 


(V,A)  226 


and  we  will  have  id1,*id2',  and  for  any  state  st  and  valuation  x, 
if  id3(st;x)  is  defined,  then  id  1  1  (st ;  x)  =id  1  ( st ;  x)  and 
id2*  (st ; x) =id 2 (st ; x) .  These  properties  hold  because  only  clause 
(ii)  of  Rdc  can  be  applied  since  id  1  and  id2  are  themselves 
reduced. 

Now  we  define  the  structure  str  by  considering  it  as  a  global 
variable  of  the  function  call: 

dummy  <-  f ormeval  (Pdc  (id3)  ) . 

Now  we  specify  how  to  replace  blanks  in  R(str) ,  for  each  R  6 
s,  by  values.  Basically,  we  use  FDs  on  R  and  values  already 
present  to  infer  values  for  bJanks.  It  is  convenient  to  give  two 
eguivalent  procedures: 

(I)  There  are  two  stages.  Stage  1:  Let  t  €  R(str)  and 

suppose  t[  X  ]  is  blank  and  that  t  arose  from  the  FD  Z— >A.  If 
there  is  a  subset  Z1  C  Z  and  a  tuple  tl  0  R  (str)  arising  from  the 
FD  Z1->X  such  that  t[  Z1  ]=  t1£  Z 1  ],  then  let  t£X  ]=t  1  [ X  ],.  If  there 
is  a  subset  Z1  C  Z  and  an  FD  Z1— >X  but  no  such  tuple  tl,  then 
give  t[X]  a  value  *  f  (z  1 , .  . .  ,  zn)  '  ,  where  'f'  is  a  neu  symbol  and 
(z1,...,zn)  =  t[  Z 1  Repeat  this  step  as  long  as  possible. 

Stage  2:  Replace  every  remaining  blank  by  a  new  (unique) 
value. 

(II)  There  are  two  stages.  Stage  1:  Let  t  e  R  (str)  and 
suppose  t[X]  is  blank.  If  there  is  an  FD  Y->X  such  that  t[Y]  is 
fully  defined  and  such  that  there  is  a  tuple  t'  in  R(str)  with 
t*[Y+X]  fully  defined  and  with  t[Yj=t*[Y],  then  let  t[X]=t*[X]. 
Repeat  this  step  as  long  as  possible. 

Stage  2:  Replace  every  remaining  blank  by  a  new  (unique) 
value . 

Clearly,  every  assignment  made  by  procedure  (I)  can  be  made  by 
procedure  (II)..  Conversely,  it  can  be  shown  by  induction  that 
every  assignment  made  by  procedure  (II)  can  be  made  by  procedure 

(I)  - 


First  we  note  that  with  the  valuation  x=  (' vl *, *v2 *,...) ,  we 
have  id  1 (str ; x) #id2 (str ; x)  since  the  values  id1(str;x)  and 


(V,A)  227 

id2(str;x)  are  formal  functional  expressions  in  one-to-one 
correspondence  with  reduced  FD  identifiers. 

We  must  show  that  str  is  a  state.  We  want  to  show  that  no  FDs 
are  violated.  To  do  this  we  will  show  that  before  and  after 
every  blank  removal,  that  no  FDs  are  violated  (by  nontlank 
entries)  and  that  there  are  no  conflicting  choices  in  the 
assignment  of  any  remaining  blanks.  This  we  prove  by  induction 
on  the  number  of  blanks  removed. 

We  first  show  that  no  FDs  are  violated  by  stage  1.  First 
assume  no  blanks  have  been  removed.  Suppose  that  tl  and  t2  are 
tuples  in  R  (str)  (with  all  its  blanks)  and  arise  from  the  FDs 
R:Z1— >A1  and  R:Z2— >A2,  respectively,  and  that  these  FDs 
correspond  to  the  formal  symbols  fl  and  f2. 

First  assume  that  Z1->A1  is  the  same  as  Z2— >A2.  The  FD  Z1->A1 

will  not  be  violated  by  tl  and  t2  because  if  1 1 [ Z 1  ]=t2[ Z1  ],  then 

t1[A1]  and  t2[  A 1  J  will  be  the  same  formal  expression.  An  FD 
W— >A 1 ,  where  W  C  Z1  (and  V*A1)  cannct  occur  for  otherwise  idl  or 
id2  could  not  be  associated  with  an  FD  from  a  Drvl  set..  The  only 
other  possible  FDs  for  which  tl  and  t2  are  completely  defined  are 
A1— >Y  for  some  Y  6  Zl.  Now  t1[Al]  and  t2[Al]  are  formal 
expressions  of  the  form  '  f  1  (x  1, . .  . ,  xn)  '  and  '  fl  ( yl  yn)  1 .  If 

1 1 [ A 1  ]=t2[ A  1  ],  then  these  formal  expressions  are  the  same,  and  so 
xi=yi  for  each  i=1,...,n.  Therefore  t1[Y]=t2[Y]. 

Now  suppose  Z 1— >A1  is  not  the  same  as  Z2— >A2.  Let  Z— >  A  be  any 
FD  in  s  (with  A  $  Z)  for  which  tl  and  t2  are  completely  defined. 

Ihen  Z+A  C  Z1  +  A1  and  Z+A  C  Z2  +  A2..  We  cannot  have  Z+A  C  Zl  since 

then  (Zl— A ) — > A 1  and  this  contradicts  the  assumptions  on  idl  and 
id2.  So  we  must  have  either  A1=A  or  A1  €  Z.  The  case  A1=A 
implies  Z 1=Z  because  we  know  Z  C  Zl,  and  a  proper  subset  is  not 
allowed  by  the  assumptions  on  idl  and  id2.  The  case  A1  e  Z 
implies  A  €  Zl.  Similarly,  we  conclude  that  either  Z— >A  equals 
Z 2— > A 2  or  A2  €  Z  and  A  €  Z2.  We  thus  have  four  cases  to 
consider:  The  case  where  Z— >A  equals  Z 1— >A1  and  Z2— >A2  has 

already  been  considered,  and  we  are  assuming  Z 1— >A1  does  not 
equal  Z2— >A2.  In  the  second  case,  Z— >A  is  Z 1— >A1  and  A2  €  Zl,  A1 
€  Z2.  The  only  way  Z— >A,  i.e.. 


Z 1— >A 1  could  be  violated  by  t 1 


(V,A)  228 


and  t2  is  to  have  1 1[  Z  1  ]=  t2[  Z  1  ].  Nov  let  us  consider  the  portion 
of  idl  or  id2  from  which  tl  arose.  It  will  have  the  form: 


Ai 

A 


11...  Z'i 

I 


£ 

n  LJ 


In 


I 


where  Z1i=A2.  Now  t 1 [ A2 ]=t2£ A2 ]  which  means  that  the  tree 
descendent  from  Z 1i  has  the  form: 


[ 


21 


1 


A2 


A 


I 

JL 


I2IQ 

I 


Thus  we  have  a  subtree  in  idl  or  id2  of  the  form: 


il 

A 


zii 

JL* 

Li  I  1 

I 

R 


A 


“1 

..  Z2  Q 

J 


1 


in 


Also,  since  Z1— A2  C  Z2  and  since  1 1 [ Z 1 -A2  ]= 1 2[ Z  1— A2  ],  the 
descendants  in  the  above  tree  of  Z1 1, . . . , Z 1i- 1 , Z 1i+  1,- . . , Z In  are 
the  same  as  the  descendants  of  the  corresponding  Z2s.  This, 
however,  contradicts  the  fact  that  idl  and  id2  are  reduced,  since 
a  reduction  could  be  made  according  to  clause  (iii)  of  Rdc.  Thus 
we  can  never  have  1 1[  Z 1  12[  Z  1  ]  and  Z->A  cannot  be  violated  in 

this  case. 

The  third  case,  where  A1  e  Z,  A  6  Z1  and  Z— >A  equals  Z->A2,  is 
analogous  to  the  above  case. 


In  the  fourth  case,  A1  €  Z,  A  €  Zl,  A2  €  Z  and  A  €  Z2.  If 
t1[Z  ]=t2[  Z  ],  then  1 1[  A  1  ]=  t2[  A  1  ]  and  1 1[  A  2  ]=t2[  A2  ]  .  If  A1=A2, 


( V, A)  229 


then  we  must  have  t1[  Z 1  ]=  t2£  Z  2  ]  (and  hence  t1[A]=t2fA])  since 
this  component  of  tl  and  of  t2  is  a  formal  functional  expression. 
If  A1*A2,  we  would  have  a  subtree  in  idl  or  id2  corresponding  to 
tl  having  a  subtree  corresponding  to  t2  which  has  a  subtree 
corresponding  to  tl,  . ..  etc.  This,  of  course,  is  impossible, 
and  this  case  will  not  occur. 

We  have  shown  that  no  IDs  are  violated  in  str  before  any 
blanks  are  filled  in.  Now  we  show  that  there  will  not  be  any 
conflicting  choices  for  the  first  assignment. 

So  suppose  t  €  R  (str) ;  t  is  associated  with  the  FD  Z— >A,  and 
t [ X  ]  is  blank.  Further  suppose  that  Z1  C  Z,  Z2  C  Z,  that  there 
are  tuples  tl  corresponding  to  Z1->X  and  t2  corresponding  to 
Z2— >X  in  str,  and  that  t[  Z1  ]=  1 1[  Z  1  ]  and  t£  Z  2  ]=t 2[  Z2  ].  We  want  to 
show  that  t1[X]=t2[X]«  The  hypotheses  mean  that  there  are 
subtrees  idl",  id2"  and  id3"  in  idl  or  id2  of  the  respective 
form: 


A 

X 

X 

1 

1 

1 

R 

1 

R 

L__ , 

E 

1 

Z  1  •  <u  • 

L 

Z  1 1  ...  i  1m 

Li ....  Lx 

III  II  1 

such  that  the  descendants  of  corresponding  2s  are  the  same.  From 
clause  (ii)  of  Rdc  we  see  that  we  must  have  Z1— >X  egual  to  Z2->X 
since  otherwise  Rdc  would  have  changed  either  id2"  or  id3". 

(This  is  why  we  needed  to  construct  id3  at  the  start  of  this 
proof.)  Thus  t1[  X]=t2[  X],  and  there  is  only  one  possible  value  to 
assign  to  t[ X ]. 

Next  comes  the  induction  step. 

Suppose  that  domain  X  in  tuple  t  has  just  been  assigned  by  the 
ID  Z1->X,  where  t  is  associated  with  the  FD  Z— >A  and  Z1  C  2...  Any 
FDs  not  involving  X  certainly  will  not  be  violated.  Consider  an 
FD  of  the  form  Y+X->B.  Since  t[X]  is  a  formal  functional 
expression  involving  the  values  of  t  in  the  Z 1-columns,  if  there 
is  another  tuple  t*  such  that  t[  Y+X]=t*[  Y+X  ],  then  we  know  that 
t[ Z1 ]=t*[ Z1  ].  From  this  we  see  that  the  FD  Y+Z1— >B,  which  does 


(V,A)  230 


not  involve  X,  is  such  that  t£ Y+Z 1  ]=t *£ Y+Zl  ].  By  induction,  we 
have  t[B]=t'[B]  so  this  FD  is  not  violated- 

Consider  an  FD  of  the  form  Y— >X.  If  there  is  a  tuple  t'  such 
that  t[Y]=t'£Y],  but  t£X]*t*£X],  then  at  the  previous  step  there 
were  conflicting  choices  for  assignment  to  X.  Hence  Y->X  is  not 
violated. 

To  show  that  there  are  no  conflicting  choices  for  the  next 
assignment,  suppose  domain  W  in  t  is  blank.  Suppose  also  that 
there  are  tuples  t*  and  t"  ana  FD s  Y'— >W  and  YM— >W  such  that 
t[  Y*  ]=t'[  Y*  ]  and  t[  Y  «  ]=t"[  Y"  ]  ,  but  that  t  •[  W  ]*t”£  «  ].  If  X  is  in 
Y*,  then,  since  t[X]  is  a  formal  expression,  we  can  replace  Y'  by 
(Y '  —X )  +Z 1  and  still  have  t[  (Y X) +Z 1  ]=t '£  (Y#— X)+Z1].  The  same 
thing  holds  if  X  is  in  Y" .  This  means  we  can  use  the  induction 
hypothesis  to  conclude  that  t ‘ £ W  ]=t ”£ W j,  and  therefore  that  there 
are  no  conflicting  choices  for  the  next  assignment. 

We  heve  shown  that  stage  1  produces  no  FD  violations.  Now  we 
show  that  stage  2  will  not  produce  any  FD  violations.  Suppose  t 
and  t'  are  in  B  (str)  and  there  is  an  FD  Z— >A  such  that  t£  Z+A] 
contains  at  least  one  value  from  stage  2.  One  cf  these  values 
must  be  in  a  domain  in  Z  since  otherwise  the  values  of  t£  Z+A] 
would  have  been  assigned  by  stage  1.  Therefore  it  is  simply  not 
possible  to  have  t£Z]=t'£Z],  and  so  t  and  t*  cannot  violate  Z->A. 

0 

Theorem  5.6..  Let  s  €  RSch;  let  e  €  Exp(s)  according  to  syntax  IV 
be  consistent.  Then  any  valid  EQ  on  e  is  a  member  of  Drvl  (e;  , 
and  if  Z->A  is  valid  on  e,  then  for  some  Z1  C  Z,  Z1->A  €  Drvl  (e). 

Proof.  Proofs  of  completeness  theorems  generally  follow  the 
contrapositive  direction:  If  c  is  not  a  member  of  Drvl(e),  i.e., 
if  c  is  not  derivable  by  the  given  rules,  then  c  is  not  valid  , 
i.e. ,  there  is  some  state  st  such  that  c  is  false  in  e(st).  The 
state  st  is  called  a  counterexample  state.  In  the  proofs  of 
completeness  for  FDs  and  for  MVDsfBeFH],  the  counterexample  state 
is  a  single  set  of  tuples  since  only  one  relation  is  being  dealt 
with.  In  the  present  situation  there  are  (possibly)  many 
relations,  each  of  which  must  be  assigned  a  set  of  tuples  for  the 
counterexample  state.  This  cannot  be  done  in  as  straightforward 


(V,A)  231 


a  fashion  as  in  the  case  for  FDs  and  MVDs  because  the  expression 
e  may  have  several  occurrences  of  the  same  base  relation.  For 
example,  suppose  e  has  the  form  e1fie2  and  that  base  relation  R 
appears  in  both  el  and  e2.  I f  we  try  to  arrive  at  an  assignment 
of  tuples  to  R  by  a  recursive  procedure  (which  is  natural  since 
relational  algebra  expressions  are  recursively  defined) ,  the 
procedure  call  on  el  may  result  in  an  assignment  to  R  which 
conflicts  with  that  of  the  procedure  call  on  e2. 

With  these  remarks  in  mind,  we  now  proceed  b y  induction  on  the 
length  of  e. 

First  consider  a  base  relation  R.  We  first  define  a  set  of 
eguivalence  classes  of  domains  (Ei  :  1<i<n]  determined  by  the  EQ 
constraints  in  s.  This  will  allow  us  to  remove  EQs  from 
consideration  and  to  utilize  existing  theorems  on  the 
completeness  of  rules  for  FDs.  That  is,  X  and  Y  are  in  Ei  if  and 
only  if  X=Y  8  Drv  1  (R)  For  each  set  Ei  we  associate  a  value  Vi: 
If  X  €  Ei  and  X=V  €  Drvl(E),  then  Vi=V.  (This  value  will  be 
unique  if  it  exists  since  R  is  consistent-.)  Otherwise,  Vi  is  an 
arbitrary,  unique  integer  IHence,  i#j  implies  Vi*Vj).  We  first 
define  a  state  stl  such  that  R(stl)  contains  only  one  tuple  t. 
Namely,  t  is  such  that  t[X]=Vj  if  and  only  if  X  €  Ej.  Note  that 
since  R  (stl)  has  a  cardinality  of  one,  all  FDs  are  true  in 
E (stl ) .  Now,  if  X=Y  is  not  derivable,  then  for  some  i  and  j,  X  € 
Ei,  Y  €  Ej,  and  i*j.  Hence  (in  R(stl)),  t[X]=Vi,  t[Y]=Vj,  and  so 
t[X]*t[Y].  The  state  stl  is  therefore  a  counterexample  state 
showing  that  X=Y  is  not  valid. 

Similarly,  if  X=V  is  not  derivable,  then  Vi*V,  where  X  €  Ei 
and  t[X]=Vi,  t[X]*V-,  again  showing  that  stl  is  a  counterexample 
to  the  validity  of  X=V. 

Now  consider  an  FD  X— >Y  such  that  for  no  XI  C  X  is  X1->Y  8 
Drvl (R) .  Using  the  sets  {Ei  :  1<i<n  )  defined  above  we  define  a 

function  g  which  maps  FDs  on  R  to  FDs  on  an  n-ary  relation  R'» 
First,  we  write  g(X)  =  i  if  X  €  Ei  and  Vi  was  an  arbitrary  value; 
otherwise  g (X) =0  (where  Vi  was  assigned  because  X^Vi  was 
derivable).  Then  for  a  set  Z=Z1....Zn  of  domains,  g(Z)  = 
g (Z 1 )  ♦, . .+g (Zn)  (where  is  union),  and  finally,  g(Z->A)  = 

g  (Z) — >g  (A) .  Wc  let  s*  be  the  schema  consisting  of  R*  and  also 
every  FD  g (Z— >A)  where  P: Z->A  €  s.  First  we  note  that  g (X)— >g (Y) 


(V,A)  232 


is  not  derivable  in  s',  i..e.,  there  is  no  X'  C  g(X)  such  that 
X'->g  (Y)  6  Drv1(R')„  (X— >Y  is  the  given  non-deri va ble  FD„ )  Since 

the  reflexive  and  pseudotransitivity  rules  are  complete  for  FDs 
cn  one  relation  (with  no  EQs) [ Arm  s] ,  there  exists  a  state  st2 
such  that  q  (X ) — >g  (Y)  is  false  in  R'  (st2) .  We  now  use  g  to 
construct  a  counterexample  state  st3  for  R:X->Y.  To  do  this  we 
define  a  function  h,  an  inverse  of  g,  which,  from  tuples  of 
B'(st2),  will  yield  tuples  of  R(st3).  Namely,  if  t'  €  R'(st2), 
then  h(t')  is  the  tuple  t  such  that  t[X]=V  if  X=V  is  derivable, 
and  t[X]=t'[i]  when  Vi  was  arbitrary  and  X  6  Ei.  Then  we  let 
R(st3)  =  h  (R '  (st2) ) ,  It  is  not  hard  to  see  that  all  constraints 
in  s  are  true  in  R(st3)  and  that  X->Y  is  false  in  R(st3). 

Now  suppose  that  e  is  of  the  form  R[X1=V1],  where  XI  and  VI 
are  lists  of  domains  and  values,  respectively.  By  adding  the 
VEQs  X 1=V 1  to  s  to  get  a  new  schema  s',  we  may  proceed  as  above, 
where  counterexample  states  for  R  in  s'  will  be  counterexample 
states  for  LfXl^VI]  in  s. 

We  next  assume  that  the  expression  e  is  a  cross  product  of 
selections  or  other  cross  products.  Suppose  that  Z— >A  is  such 
that  7 1— > A  $  Drvl (e)  for  every  Z1  C  Z.  Let  Z1  C  Z  be  the  domains 
cf  e  appearing  in  the  same  selection  term  that  A  appears  in.  (Z1 
may  be  empty.)  Let  us  also  assume  that  this  term  is  first  in  e, 
i.e.  ,  that  we  can  write  e  as  R[X1  =  V1]fie2^  Then  for  no  Z2  C  Z1  is 
Z2->A  €  Drvl  (R[  Z  1  =  V1  ])  for  otherwise  we  would  have  Z2— >A  € 
Drvl(e)..  By  the  induction  hypothesis,  there  is  a  state  stl  such 
that  Z 1— > A  is  false  in  R[  X1=V  1  ]  (stl) .  If  e2(st1)  (the  remainder 
of  e)  is  nonempty,  then  Z— >A  will  also  be  false  in  e(stl) ,  for  if 
tl  and  t2  are  tuples  of  R[ X 1= VI ] (st 1)  contradicting  Z1— >A  and  if 
t'  is  any  tuple  of  e2(st2),  then  tlfit*  and  t2St'  are  tuples  in 
e(stl)  contradicting  Z->A.  It  remains  to  show  that  if  e2(st1)  =  0, 
that  we  can  modify  the  state  to  get  e2\st1j  nonempty  while 
retaining  tuples  in  R[X1=V1]  contradicting  Z1— >A- 

First  consider  the  subexpession  r  of  e  consisting  of  all  the 
selections  on  the  base  relation  R:  r  =  E[X1  =  V1  ]£.. .fiR[ Xn=Vn]. 

We  will  write  this  as  s1<t...Qsn.  Before  we  proceed  to  modify  stl 
so  that  r  is  nonempty,  we  prove  a  lemma:  Note  that  every  domain 
Y  in  r  can  be  written  di+X  where  1<X<deg(R)  and  di  (a 


( V ,  A)  233 


"displacement")  eguals  deg  (R) •  (i- 1) ,  where  Y  is  a  domain  in  the 
term  si.  Then  we  have: 

(i)  if  di+X=dj+Y  6  Drvl  (r)  ,  i*j,  X*Y,  and  no  VEQ  di+X=V  or 
dj+Y=V  is  derivable,  then  di+X=dj+X  and  di+Y=dj+Y  are  in 
Drvl  (r)  ,  and  X=Y  is  in  Drvl(R); 

(ii)  if  di+X=V  6  Drvl  (r)  ,  then  0->X  6  Drvl  (si)  ,  and 

(iii)  if  di*X=d j+X  €  Drvl  (r)  ,  then  jar— >X  6  Drv  1  (si)  «Drv  1  (s  j)  . 

To  prove  (i) ,  note  that  rule  [  3]  (transitivity)  is  the  only 
one  which  can  generate  the  indicated  EQs.  So  we  use  induction  on 
the  number  of  applications  of  rule  [3], 

There  are  only  two  possible  pairs  of  EQs  -lo  which  transitivity 
can  be  applied  and  which  themselves  are  not  the  result  of 
transitivity.  One  pair  consists  of  di+X=dk+X,  which  would  be  the 
result  of  rule  [ 9 ]#  and  dk*X=dj+Y,  which  would  be  the  result  of 
an  EQ  X=Y  derivable  in  si  and  sj.  Prom  X=Y  we  can  get  di+ Y=d i+X , 
which  yields  di+Y=dj+Y  derivable  in  r,  and  also  dj+Y=dj+X,  which 
yields  di-e-X=d  j+X  derivable  in  r.  Now  the  EQ  X=Y  must,  in  fart, 
be  derivable  in  R,  for  the  only  other  way  to  get  an  EQ  in  a 
selection  on  a  base  relation  is  to  derive  it  from  VEQs  which  we 
assumed  impossible. 

The  other  pair  of  EQs  is  like  the  one  above  with  the  roles  of 
X  and  Y  interchanged. 

Now  assume  the  hypotheses  hold  and  also  the  there  are  EQs 
di+X=dk*Z  and  dk+Z=dj*Y  derivable  in  r.  First  assume  X*Z#Y.  We 
cannot  have  any  VEQs  dk  +  Z=V  since  this  would  yield  di*X=V  and 
(3j  +  y=V..  By  induction,  we  have  dk  +  z=dj-»-Z  and  di+Z=dk+Z  derivable 
in  r,  and  X=Z  and  Y=Z  derivable  in  R.  We  therefore  have  X=Y 
derivable  in  R  and  di+Z=dj+Z  derivable  in  r,  and  from  this  we  get 
di«-X=  d  j+X  and  di+Y=dj  +  Y  derivable  in  r.  If  Z  is  X,  th£n  we  have 
di*X=dk+X  and  dk+X=dj+Y  derivable  in  r.  Ey  induction  on  the 
second  EQ,  we  have  dk-*-A=d  j*X  and  dk+Y=dj+Y  derivable  in  r  and  X=Y 
derivable  in  R.  From  this  we  derive  di+X=dj+X  and  di+ Y=d j+ Y  in 
r.  If  Z  is  Y  we  proceed  analogously. 

To  prove  part  (ii) ,  first  assume  X=V  €  Drvl  (si).  Then, 
clearly,  £->X  €  Drvl  (si).  If  ui+X=dj*Y  and  dj»-Y=V  are  in  Drvl(r) 
and  i#j  and  X#Y,  then  by  (i)  ,  di+X=dj*X  €  Drvl  (r)  .  By  (iii)  , 

0->X  €  Drvl  (si)..  If  i=j  or  X  is  Y,  we  also  get  0->X  €  Drvl  (si). 
To  prove  (iii)  ,  we  have  that  if  di+X=V  and  dj+X^V  are  in  Drvl  (r)  , 


(V,A)  234 


then  /5->X  €  Drvl  (si)  *Drv1  (s j)  by  (ii)  .  If  di+X=dk+Y  and 
dk+Y=dj+X  are  in  Drvl  (r)  ,  and  X#Y,  then  by  (i)  ,  di+X=dk+X  and 
dk+X=dj«-x  are  in  Drvl  (r)  ,  so  by  induction,  j2J — > X  €  Drvl  (si)  and 
$ — > X  e  Drvl  (sj)  .  If  the  EQ  is  the  result  of  rule  [9],  then  there 
must  be  an  FD  Z— >X  €  Drvl  (R)  such  that  di«-Z=dj+Z  is  in  Drvl(r). 

By  induction,  Z  6  Drvl  (si)  «Drv 1  (sj ) ,  and  therefore  0—>X  € 

Drvl  (si)«Drv1  (sj) . 

This  proves  the  lemma. 

We  now  will  indicate  how  to  modify  stl.  We  may  suppose  that 
R  (st  1 )  =  R[X1  =  V1  ]  (stl )  =  {t  1 ,  t2]  .  (If  not,  delete  the  extra 

tuples;  they  do  not  add  anything.)  First  let  u  be  a  1:1  function 
defined  on  values  appearing  in  tl  and  t2  such  that  its  image  is 
distinct  from  the  values  in  tl  and  t2  and  from  the  values 
appearing  in  the  selection  terms  of  r.  Let  El,...  .,Eg  be  the 
equivalence  classes  of  the  domains  of  r  under  Associate  a 

value  Vi  with  each  Ei  as  follows:  Vi=V  if  X=V  €  Drvl(r)  for  some 
X  €  Ei;  if  there  is  some  X  €  Ei  such  that  1<X<deg(R)  (if  X  is  a 
domain  of  si)  ,  then  Vi=u(t1£X])  ;  otherwise  Vi  is  an  arbitrary 
unique  value.  Define  R  (st2)  =  {t1',t2*}  as  follows:  t1*£X]  =  Vk 
if  X  €  Ek  and  Vk  is  assigned  by  a  VEQ ;  t1'£X]  =  u(t1[X]), 
otherwise.  Also,  t2*[X]  =  Vk  if  X  e  Ek  and  Vk  is  assigned  by  a 
VEQ;  t2  *  [  X  ]  =  u  (t2£  X  ])  ,  otherwise. 

We  first  show  that  st2  is  a  state,  that  tl*  and  t2 '  appear  in 
si,  and  that  st2  is  still  a  counterexample  state  for  Z1— >A. 

If  R:X=V  is  in  s,  or  if  X=V  is  a  component  of  X1=V1,  then  Vk 
is  assigned  by  a  VEQ ,  where  X  6  Ek  and  Vk=V.  Hence  t1*£X]  = 
t2'[X]  =  V..  Thus  B(st2)  satisfies  the  VEQs  of  s,  and 
R[  XI  =V  1  ](st2)  =  R  (st2  )  . 

For  the  other  properties  of  R  (st2)  ,  we  will  show  that  t1*‘X]  = 
t2'£X]  if  and  only  if  t1[X]  =  t2[X].  First,  it  is  clear  from  the 
definition  that  t1£X]=t2(X]  implies  1 1  *  £X  ]=t2  '  [  X  ]-  Now  suppose 
1 1 '[ X  ]=t2 • £ X  ].  If  Vk  is  not  assigned  by  a  VEQ,  where  X  6  Ek, 
then  1 1  •  £  X  ]  =  u(t1£X])  and  t2'£X]  =  u(t2£X]),  and  since  u  is  1:1, 
we  get  t1£  X  ]=t2£  X  ]. 

Now  suppose  1 1  •  [  X  ]  =  Vk  =  t2*£X],  where  X  €  Ek  and  Vk  is 
assigned  by  a  VEQ.  This  means  that  X^Vk  is  derivable  in  r.  By 
part  (ii)  of  the  lemma,  0— >X  is  in  Drvl  (si).  Since  tl,  t2  € 


(V,A)  235 


E[X1=V1  ](st  1)  ,  we  have  t1[X]=t2[X].  This  proves  tl  [  X  ]=t2[  X  ]  if 
and  only  if  t.1  *[ X  ]=t2  *[  X  ]„  From  this  we  can  conclude  that  FDg 
and  DEQs  which  are  true  (false)  in  R(stl)  are  true  (false)  in 
E  (st 2 ) -  Thus  st2  is  a  state  of  s,  and  Z1->A  is  false  in 
E[X1=V  1  ]  (st 2)  - 

The  next  step  is  to  show  that  we  can  add  tuples  to  E(st2)  to 
get  a  nonempty  state  for  every  other  selection  in  r. 

Consider  a  selection  term  R[XifVi]  (i=2,...,r)  in  r  of 
displacement  di;  that  is,  the  domains  of  the  selection  are 
di  +  1 , ... ,  di+deg (E) .  Define  a  tuple  ti  by  ti[ X  ]=Vk  where  di+X  6 
Ek.  By  construction,  if  ti  is  placed  in  E (st2)  ,  then  ti  will 
appear  in  R[Xi=Vi].  It  is  also  easy  to  see  that  ti  will  satisfy 
all  EQs  in  schema  s.  We  must  show  that  when  t2,.„.,tn  so 
constructed,  are  added  to  B (st2) ,  that  no  FDs  are  violated. 

First  *e  show  that  there  will  be  no  FD  violations  among 
t2,..o,tn.  So  suppose  E: W— >B  is  in  s  and  ti[  W  ]=t  j[  W  ].  Then  for 
each  component  Wk  of  W,  ti[  Wk  ]=t  j[  Wk ]..  If  one  of  the  values  was 
assigned  by  a  VEQ  Vw  or  was  assigned  arbitrarily,  then  both  di+Wk 
and  dk+Wk  are  in  Ew  since  this  Vw  is  distinct  from  all  other 
values  associated  with  E-sets.  If  one  of  the  values  was  assigned 
by  the  second  clause,  then  both  were,  and  we  have,  for  some  U1 
and  U2,  di*Wk=01  and  dj+Wk=U2  derivable.  If  01  is  Wk,  we  have 
di+Wk=Wk  derivable  in  r;  if  not,  then  part  (i)  of  the  lemma  will 
give  di+Wk=Wk  derivable  in  r.  Similarly,  aj+wk=wk  is  derivable 
in  r.  Hence  di+Kk=a Wk  is  derivable  in  r.  This  can 
collectively  he  written:  di+ W=d j+ is  derivable  in  r.  Now 
di+W->di+B  and  dj+W— >dj+B  are  also  derivable  in  r  and  have  the 
same  identifier.  We  therefore  have  di*B=d j+B  6  Drvl(r),  and 
therefore  ti[B]=tj[B]  since  di+B  and  dj+B  are  in  the  same  E-set. 

Now  we  will  show  that  there  will  be  no  FD  violations  between 
ti*  or  t2 •  and  any  of  the  ti,  i=2,...,n.  So  suppose  R:W— >B  is  in 
s  and  t'[W]=ti[W]  (where  t*  is  ti  1  or  t2')«  For  each  component 
Wj  of  W,  t*[W j  ]=ti[ Wj First  suppose  Vk  is  assigned  by  a  VEQ, 
or  by  the  third  clause,  where  di+Wj  6  Ek.  The  values  assigned  to 
these  E-sets  are  unique,  so  we  also  have  O+Wj  6  Ek,  i.e.  , 

Wj=di+Wj  is  j>rivable  in  r.  Next  suppose  Vk  was  assigned  by  the 
second  clause:  There  is  an  X  €  Ek  with  1<X<deg(R).  We  have 


(V,A)  236 


di+Wj=X  derivable,  but  from  the  lemma,  we  also  get  di+Wj=Wj 
derivable.  Collectively,  we  have  W=di+W  derivable  in  r.  Since 
W— >B  and  di+W— >di ♦B  have  the  same  identifier,  we  get  B=di+B  in 
Drvl(r).  If  these  domains  (B  and  di+B)  are  in  an  E-set  Ek  whose 
value  is  assigned  by  a  VEQ,  then  t'[BJ=ti[ B]=Vk.  Otherwise,  Vk 
is  equal  to  u(t1[X]),  where  1<X<deg(R)  and  X  €  Ek.  Since  X=B 
will  be  in  Drvl  (r) ,  it  is  also  in  Drv  1  (R[  XI5V 1  ])  and  if  t*  is 
t1«,  then  we  know  ti[  B  ]=u  (t1[  X])  =  u  (t1£B  ])  =t  V  [  B  ].  If-  t*  is  t2', 
then  from  B=di+B  we  can  conclude  that  0— >B  is  in  Dr  v  1  (R[  X 1=  VI  ])  , 
and  this  also  yields  t i[  B  ]=  1 1  *  [  B  ]=  1 2  *  £  B  ].  Thus  no  FDs  are 
violated,  and  we  may  define  a  state  st3  by  R(st3)  =  P  (st2 )  + 
ft2,.o.,tn}.  This  state  will  have  the  property  that  r(st3)#0  and 
Z 1— >A  is  false  in  R[ Xl= VI  ]  (st 3) . 

We  still  may  have  e(st3)=0  because  selections  on  other 
relations  may  be  empty.  However,  by  a  process  similar  to  the  one 
above,  we  may  add  tuples  to  get  a  nonempty  cross  product. 

The  case  for  EQs  which  are  not  derivable  in  the  cross  product 
can  be  handled  in  an  analogous  fashion. 

This  completes  the  case  for  cross  product- 

Now  suppose  that  e  is  a  union  e1+e2  and  that  Z1— >A  $ 

Drvl  (e1  +  e2)  for  every  Z1  C  Z.  In  particular,  we  have  Z— >A  $ 
Drv1(e1+e2).  Suppose  Z— >A  is  not  derivable  in  el..  By  induction 
there  is  a  state  stl  such  that  Z->A  is  false  in  el  (stl).  Then 
Z— > A  will  also  be  false  in  (e1  +  e2)  (stl).  If  Z— >A  is  not 
derivable  in  e2,  we  proceed  in  a  similar  manner.  In  the 
remaining  case  id1:Z->A  €  Drvl  (el),  id2:Z— >A  6  Drv1(e2)  but 
Rdc  (id  1 ) *Rdc  (id2) -  By  Theorem  4.4,  there  is  a  state  st  and  a 
valuation  x  such  that  idl  (st ;  x)  *id2  (st;  x)  and  by  Theorem  4.5 
there  are  tuples  tl  €  el(st)  and  t2  6  e2(st)  such  that 
t1[Z]=t2[Z]  but  t1£A]#t2[A].  Thus  we  will  have  tl,  t2  € 

(e1+e2) (st) ,  and  these  tuples  will  contradict  rhe  FD  Z— >A. 

If  an  EQ  is  not  derivable  in  e1+e2,  it  is  not  derivable  in 
either  el  or  e2.  We  construct  a  counterexample  state  by 
induction  on  the  appropriate  component  (el  or  e2)  .  and  this  will 
also  also  be  a  counterexample  state  for  the  union. 


(V,A)  237 


In  the  last  case,  e  is  a  projection  e1[X}.  If  c  is  any 
constraint  not  derivable  in  e,  then  cl  will  not  be  derivable  in 
el,  where  cl  is  c  with  each  domain  Y  replaced  by  X[Y]«  By 
induction,  we  construct  a  counterexample  state  for  cl  in  el  and 
this  will  also  be  a  counterexample  state  for  c  in  9. 

0 

Theorem  5.7.  let  s  e  FSch,  and  let  e,e1,e2  6  Exp (s) Then 

(i)  Drvl  (e[  X=Y][  Z=V  ])  =  Drv  1  (e£  Zf  V  ]£X=Y  ]) 

(ii)  Drvl  ( (e1fie2)  [  X=V  ])  =  Drvl  (e  1£X=V  ]fie2)' ,  if  X<deg  (R) 

(iii)  Drvl  (  (e1®e2)  [X=V  ])  =  Drvl  (e  10  (e  2[X= V  ]) )  ,  if  X>deg(E) 

(iv)  Drvl  ( (e1+e2)[  X])  =  Drvl  (e  1[  X  ]+e2[X  ]) 

Proofs  (i)  From  the  definitions  we  have 

Drvl  (e[X=Y  3C Z  =  V  ])  =  Cl  1  (Cl  1  (Dr vl  (ej  ♦  X=Y)  ♦  Z=V)  and 
Drvl  (^[Z=V  ][X  =  Y  ])  =  C11  (C11  (Drvl  (e)  ♦  Z=V)  +  X=Y)^ 

It  is  not  hard  to  show  that  the  inclusion  C11  (C11  (Drvl  (e)  ♦  X=Y) 

+  Z=V)  C  C11  (C11  (  (Drvl  (e)  +  Z=V)  +  X=Y)  J  is  true  and  also  that 

the  reverse  inclusion  holds, 

(ii)  The  definitions  give  us 

Dr  v  1  (  (e1<5te2)  [  X=V  ])  =  C  11  (C11  (Drv  1  (el)  ♦  Drv1«(e2))  +  X=V)  and 

Drvl  (e1[X=V  ]fie2)  =  C11  (C11  (Drv  1  (el )  ♦  X=V)  +  Drvl'  (e2)). 

Again  ve  can  show  equality  by  showing  inclusion  in  both 
directions.  Part  (iii)  is  analogous. 

(iv)  The  formulas  give  us 

Drvl  ( (e1+e2)  [X  ]J  =  {id2:Z->A  :  idl :  X[  Z  j->X[  A  ]  €  Drv1(e1*e2)}  ♦ 

{Y=Z  :  X[  Y  ]=X[  Z ]  €  Drvl  (e1+e2)  }  + 

{Y= V  :  X[Y]=V  €  Drvl  (e1+-22) }  , 
with  idl,  id2  as  in  the  definition,  and 
Drvl  (e  1  [  X3+e2[  X 3)  =  {x  €  Drvl  (e1[X  3)  *Drv1  (e2[X  ])  :  x  is  an  EQ} 

+  {id1:Z->A  €  Drv1(e1[X3)  :  there  is 
id2  :Z— >  A  €  Drvl  (e2[  X  3)  with 
F.dc  (id  1 )  =  Pdc  (id2)  }  . 

First  we  see  that  {Y=Z  :  X[Y3=X[Z3  €  Drv1(eHe2)]  ♦  [Y=V  :  X[Y]=V 
e  Drvl (el  +  e2)  }  = 


(V,A)  238 


{x  €  Drvl  (e1[  X  ])  *Drv1  (e2[  X  ])  :  x  is  an  EQ}  . 

Now  an  FD  idl :  X[  Z  ]— >X[  A  ]  is  in  Drv1(e1«-e2)  if  and  only  if 
idl:  X[  Z  ]— >X[  A  ]  e  Drvl  (el)  and  there  is  id  2:  X[  Z  ]— >X[  A  ]  in  Drvl  (e2) 
with  Rdc  (idl) =Edc  (id2) .  This  is  true  if  and  only  if  id1':Z->A  € 
Drvl  (e1[  X])  and  id2':Z->A  €  Drvl  (e2[X  ])  ,  where  idl*  (id2»)  is 
obtained  from  idl  (id2)  by  adding  an  arc  to  each  leaf  node  and  to 
the  root  node-  Now  R  dc  (idl  * )  =Rdc  (idl )  =Rdc  (id2)  =Rdc  (id2*)  ,  and  so 
the  condition  is  equivalent  to  id1*:Z— >A  -  Drvl  (e1[  X  ]+e2[  X  j) . 

Ihis  shows  that  the  two  sets  of  FDs  are  the  same. 

□ 


Theorem  7.1.  Let  s  6  RSch  ;  let  e  be  an  expression  over  s ,  and 
let  ts  be  a  translation  sequence.  Then  as  long  as  the  semantics 
are  independent  and  at  least  medium: 

(i)  Noins  (ts,e)  =>  (¥st)  e(ts  (st)  )  C  e  (st)  ; 

(ii)  Nodel  (ts,e)  =>  (¥st)  e  (st )  C  e(ts(st)); 

(iii)  Iok  (ts,e)  =>  (¥st)  e  (ts  (st)  )  — e  (st)  C  Ins(ts,e),  and 

(iv)  Dok  (ts,e)  =>  (¥st)  e  (st ) -e  (ts  (st)  )  C  *#Del(ts,e). 


(For  notational  convenience,  valuations  are  implicit.) 

Proof  .  (proof  of  (i)  )  .  Suppose  Noins  (R).  Since  ts  contains  no 
insertions  on  R  and  because  the  semantics  are  independent  and  at 
least  medium,  R(ts(st))  C  R  (st) . 

If  Noins(e[Xj),  then  Noins  (e)  so  for  all  st,  e(ts(st))  C 
e  (st)  ,  from  which  follows  e[X](ts(st))  C  e[  X  ]  (st)  . 

If  Noins  (e[  X=Y  ])  ,  then  if  Noins  (e)  we  have  e(ts(st))  C  e(st) 
for  all  st,  and  so  e[  X=Y  ]  (ts  (st)  )  C  e£X=Y](st).  If  Iok(e)  • 

(Ins  (e)  C  Cv(e[X=Y])),  then  we  have,  for  all  st,  e  ( ts  (st)  ) -e  (  st) 
C  Ins  (e)  C  Cv(e[X=Y]).  Now  in  the  equation  e£X=Y  ]  ( ts  (st)  )  = 
(e[X=Y]  (ts  (st)  )-e  (st)  )  ♦  (e[ X=Y  ]  (ts  (st)  )  «e  (st)  )  ,  the  first  term 

is  empty  because  it  is  a  subjet  of  both  e[ X=Y ] ( ts (st)  )  and 
Cv(e[X=Y]).  The  second  term  is  a  subset  of  e  (st)  and  each  tuple 
satisfies  X=Y  and  so  this  term  is  a  subset  of  e[X=Y](st).  Hence 
e[X=Y  ]  (ts  (st)  )  C  e[  X=  Y  ]  (st)  . 

If  Noins  (e[X=V  ])  ,  we  get  e[  X=V  ]  (ts  (st)  )  C  e[X=V](st)  as  for 
e[X=Y  ]. 


(V,A)  239 


Suppose  Noins (el fie2) .  Then  Noins(el)  and  Noins(e2).  Hence, 
for  all  st,  e1(ts(st))  C  el  (st)  and  e2(ts(st))  C  e2  (st)  ,  and  this 
yields  (e1Se2)  (ts  (st)  )  C  (e1®e2)  (ts  (st)  )  . 

If  Noins  (e1  +  e2) ,  then  Noins  (el)  and  Noins (e2).  Then  for  all 
st,  e  1  (ts  (st)  )  C  el(st)  and  e2(ts(st))  C  e2  (st)  ,  so 
(e1  +  e2)  (ts(st)  )  =  e1(ts(st))  +  e2(ts(st))  C  el  (st)  ♦  e2  (st)  = 

(el+e  2)  (ts  (st) )  . 

Suppose  Noins (e 1— e 2) ,  and  assume  the  first  clause  of  [7] 
holds.  Then  Noins(el)  and  Nodel(e2),  so  for  all  st,  e1(ts(st))  C 
el(st)  and  e2(st)  C  e2  (ts  (st)  ) .  A  few  set- theoretic  operations 
give  us  (el— e2)  (ts(st))  C  (el— e2)  (st)  .  Now  suppose  the  second 
clause  of  [7]  holds.  We  show  that  (e1-e2)  ( ts  (st)  )  —  (e  1-e2)  (st )  is 
empty  for  every  st,  again  by  contradiction.  If  w  € 

(el— e2)  (ts  (st)  )  —  (el— e2)  (st)  ,  then  either  w  €  el  (ts  (st ) )  — e  1  (st )  or 
w  €  e2  (st) -e2  (ts(st) )  .  In  the  first  case,  we  get  w  €  Ins  (el)  C 
Ins(e2)  which  means  w  6  e2(ts(st)),  contradicting  w  e 
(el— e2)  (ts  (st)  )  .  The  second  case  violates  the  assumption  of 
Nodel  (e2)  .  Now  suppose  the  third  clause  of  [ 7  ]  holds.  If  w  € 
(el—  e  2}  (ts  (st)  )- (e  l-e2)  (st)  ,  then  again,  either  w  € 
el  (ts  (st)  )— el  (st)  or  w  €  e2  (st)  — e  2  (ts  (st)  )  .  The  first  case  is 
impossible  since  Noins  (el )  is  an  assumption.  The  second  case  and 
the  assumption  yield  w  €  ##Del(e1)  which  means  w  $  e1(ts(st))  , 
which  is  also  impossible. 

(proof  of  (ii) ) .  (This  proof  is  similar  to  part  (i) ) . 

Suppose  Nodel  (P).  Since  ts  contains  no  deletions  on  F  and 
because  the  semantics  of  RQ  are  independent  and  at  least  medium, 

e  (st)  £  e  (ts(st)  )  . 

If  Nodel  (e[X])  ,  and  Nodel  (e)  ,  then  for  all  st,  e  (st)  C 
e(ts(st))  which  means  e[X](st)  C  e[  X ]  (ts  (st)  )  .  Now  suppose  the 
second  clause  of  [2]  holds.  We  will  show  that 

e[  X]  (st)-e[  X]  (ts  (st) )  is  empty  by  contradiction.  Suppose  w  is  in 
this  set.  Then  for  some  extension  w*  of  t,  w*  €  e ( st) — e ( ts (s t) )  . 
From  the  suppositions  we  get  w#  €  ##Del(e),  and  w=w'[X]  € 

##Del  (e)  [  X  ]  C  Ins(e)[X]  =  Ins(e[X]).  But  this  contradicts  w  e 
e[  X  ]  (st)  -e[  X  ]  (ts  (st)  )  . 


(V  ,  A)  240 


If  Nodel  (e[  X=Y  ])  ,  then  if  Nodel  (e)  ,  then  for  all  st,  e  (st)  C 
e(ts(st))  and  it  follows  that  e[X=Y](st)  C  e[  X=Y ]  (ts  (st) ) . 
Otherwise,  if  Dok  (e)  and  Del(e)  C  Cv(e[X=Y]),  then  we  have 
e  (st)  — e  (ts  (st)  )  C  ##Del(e)  C  Cv(e[X=YJ).  Then  in  tne  equation 
e[X=  Y  ]  (st)  =  (e£  X=Y  ]  ( st)  — e  ( ts  (st)  ) )  *  (e[  X=  Y  ]  (st)  «e  (ts  (st ) )  )  ,  the 

first  term  is  empty  since  it  is  a  subset  of  both  e£  X=Y  ](st)  and 
Cv(e[X=Y]),  and  the  second  term  is  a  subset  of  e(ts(st))  and  each 
tuple  in  it  satisfies  X=Y.  Hence  e[X=Y](st)  C  e[ X= Y  ]  (ts (st) )  . 

The  case  for  e[X=V]  is  proved  similarly. 

If  Nodel (e 1fle2) ,  then  Nodel(el)  and  Nodel  (e2).  So  for  all  st, 
el  (st)  C  e1(ts(st))  and  e2(st)  C  e2(ts(st)),  from  which  follows 
(eiae2)  (st)  C  (e1fie2)  (ts  (st)  )  .. 

Suppose  Nodel(e1+e2)  and  assume  the  first  clause  of  [6]  holds. 
Then  Nodel  (el)  and  Nodel (e2).  So  for  all  st,  el  (st)  C  el  (ts(st)) 
and  e2(st)  C  e2(ts(st))  ,  so  (e1+e2)  (st)  C  (e1+e2)  (ts(st)).  Now 
assume  that  the  second  clause  of  [6]  holds.  We  derive  a 
contradiction  to  show  that  (e  1+e2  )  (st )  -  (e  1+e2)  (ts  (st)  )  is  empty. 
So  suppose  w  is  an  element  of  this  set.  Then  either  w  6 
el  (st) -e  1  (ts  (st)  )  or  w  €  e2  (st)  — e  2  (ts  (st)  )  .  The  first  case  is 
impossible  because  of  the  hypothesis  that  Nodel  (el)  holds..  The 
second  case  is  also  impossible  because  the  assumptions  imply  w  € 
##Del  (e2)  C  Irs(el)  which  contradicts  w  <$■  (e1+e2)  (ts(st)).  The 
case  where  \;  e  € 2  (st) -e2  ( ts  (st)  )  is  analogous. 

If  Nodel (e 1-e2) ,  then  Nodel(el)  and  Noins(e2).  This  means 
that  fot  all  st,  el  (st)  C  el  (ts  (st)  )  and  e2(ts(st))  C  e2(st). 
from  this  we  get  (e1-e2) (st)  C  (e1-e2) (ts(st)). 

(proof  of  (iii) )  . 

If  Ick^R)  (i.e.,  if  the  semantics  are  strong),  then 
R  (ts (st) ) -R (st)  C  Ins  (E)  because  the  semantics  are  independent 
and  at  least  medium. 

If  Iok(e[X]),  then  Iok  (e)  ,  so  for  all  st,  e[  X  ]  (ts  ( st) -e[  X  ]  (st) 
C  (e  (ts  (st)  ) -e  (st)  )  [X  ]  C  Ins(e)[X]  =  Ins(e[X]). 

If  Iokie[X=Y]),  then  Iok(e),  sc  for  all  st, 
e[X=Y]  (ts(st)  )-e[  X=Y]  (st)  =  (e  (ts  (st)  )-e  (st)  )  [X=Y  ]  C  Ins  (e)  [X=Y  ] 

=  Ins(e[X=Y]).  Similarly  for  Iok(e[X=V]). 


(  V,  A)  241 


Suppose  Iok (e1®e2) .  Then  Noins(el)  and  Noins(e2),  which  we 
know  implies  Noins  (e1®e2)  .  Therefore,  for  any  state  st, 

(e1®e2)  (ts  (st)  )  — (e1®e2)  (st)  =  (6,  which  is,  cf  course,  a  subset  of 
Ins  (e1Qc2).. 

Suppose  Iok  (el +e2) -  Let  st  be  any  state  and  suppose  w  € 
(e1+e2)  fts  (st)  )—  (e1+e2)  (st)  .  Some  set- theoretic  manipulations 
give  us  w  e  e  1  (ts  (st)  ) -el  (st)  or  w  €  e2  ( ts  (st) ) -e2  (st)  .  If 
Iok  (el)  and  Iok(e2),  then  w  €  Ins  (el)  or  w  €  Ins(e2),  so  w  6 
Ins  (e  1) s-  It  s  (e2)  =  Ins(e1+e2).  If  Iok  (el)  and  Noins  (e2),  then 
e2  (ts  (st)  )-e2  (st)  =  0,  so  w  e  e  1  (ts  (st)  ) -e  1  (st )  C  Ins  (el)  C 
Ins(e1  +  e2).  Similarly,  if  Noins (el)  and  Iok(e2),  then  w  t 
Ins  (e  1  +  e2) . 

Suppose  Iok (el— e2) .  Let  st  be  any  state  and  suppose  w  € 
(e1-e2)  (ts  (st )  )- (e1-e2)  (st) .  Then  w  6  e 1  (ts (st) ) -e 1 (st)  C 
ins  (el)  or  w  €  e2  (st) -e2  ( ts  (st)  )  C  ##Del(e2).  Since  Ins(el)  = 
##Del(e2),  Ins(el)  C  Ins(e1— e2)  and  ##Del(e1)  C  Ins(e1-e2). 
Therefore,  in  either  case,  w  €  Ins(e1-e2). 

(proof  of  (iv)).  If  Dok(E)  (i.e.,  if  the  semantics  are 
strong),  than  B (st)— E  (ts  (st) )  C  ##Del(B)  since  the  semantics  are 
independent  and  at  least  medium. 

If  Dok(e[X]),  then  Dok  (e)  ,  so  for  all  st, 
e[  X]  (st)-e[  X]  (ts  (st)  )  C  (e  (st)  -e  (ts  (st)  ) )  [  X  ]  C  (# #Del  (e)  )  [ X  ]  = 
(##(Del(e)[  X~=  *—*]))[  X  ]  =  ##  (Del  (e)  [  X“=»- •  ][  X  ])  =  ##Del(e[X]). 

If  Dok(e[X=Y]),  then  for  a  Dy  st,  e[  X=Y  ]  (st)  — e[  X=Y  ]  (ts  (st ) )  = 
(e(st)-e  (ts  (st)  ))  [X=Y]  C  e  (st ) -e  (ts  (st)  )  C  ##Del(e)  C 
■##Mb(e[X=Y],Del(e)  )  =  #  #Del  (e[  X=Y  ])  .  Similarly,  Dok(e[X=V]) 
implies  e[ X=V  ]  (st) -e[ X= V ]  (ts (st) )  C  # #Del  (e[ X=v  ])  ,  for  all  st. 

Suppose  Dok(e1fie2)  .  Let  st  be  ary  state  and  suppose  w  € 

(e1fie2)  (st) -(e1®e2)  (ts  (st) )  .  This  implies  pi  (w)  € 
el  (st)— el  (ts(st) )  or  p2(w)  €  e2 (st) — e 2 (ts (st) ) -  If  the  first 
ccndition  (in  the  definition)  holds,  then  pi  (w)  €  ##Del(e1)  or 
p2  (w)  6  ##Del(e2).  In  the  former  case,  there  is  a  w*  €  Del  (el) 
with  w'>p1(w),  and  hence  a  w"  €  Del  (e  1)  S  {(—,...,-)  }  with  w">w. 

We  get,  then,  w  €  ## (Del (el) fi  {  (— ,- ..,—)}  )  C  ##Del (e1fie2) . 
Similarly,  p2  (w)  €  e2  (st)  — e2  (ts  (st)  )  implies  w  €  ##Del(e1®e2)  . 

If  the  second  condition  holds,  then  e 1 (st ) — e 1  (ts (st) ) =0  so  we 


(V,A)  242 


must  hdve  p2(w)  €  e2  (st)  — e2  (ts  (st)  )  C  ##Del(e2),  If  the  third 
condition  holds,  we  similarly  get  p  1  ( w)  6  ##Del(e1).  In  both 
cases,  we  proceed  as  above  to  get  w  6  ##Del (e 1Se2) . 

If  Dck  ie1  +  e2)  ,  we  get,  for  all  st,  (e1*e2)  (st)-(e1+e2)  (ts(st)) 
C  (el  (st)-el  (ts  (st; ) )  ♦  (e2  (s t) -e 2  (ts  (st)  ) )  C  ##Del(e1)  «• 

##Del  (e2)  =  ##Del(e1*e2)  since  Del(el)  =  Del(e2)  = 

Eel  (el)  *Del  (e2)  . 

Suppose  Dok(e1-e2).  Then  for  any  state  st , 

(e1-e2)  (st)— (el— e2)  (ts  (st) )  C  (e  1  (st)  — e2  (ts  (st)  )  + 

(e2  (st)-e2  (ts  (st)  )  )  C  ##Del(e1)  ♦  Ics(e2)  =  ##Del  (e  1-e2)  ,  and 
this  follows  from  the  three  conditions  in  the  definition  as  in 
previous  cases. 

D 


(V  ,G  )  243 


Glossary  of  Terms 


Em, Dm  1 , Dm2 
Schr  Schl , Sch2 

Str (s) , St r 1  (s) , Str2  (s) 

St  (s) ,St1  (s) , St2  (s) 

Q  (s)  ,  Q 1  (s )  ,  Q2  (s> 

s ,s  1 ,  s2 
Q 

str, str  * 

Mm 

Ms  (s  1  ,s2) 


m 

Mg (si ,s2) 


t 

v1,v2,v3, . o  . 

+ 


=  > 

€ 

C 

■3 


abstract  data  models  (2.3) 

set  of  schemas,  fi^st  components  of 
abstract  data  models  (2.3) 

set  of  structures  on  schema  s, 
database  Instances  not  necessarily 
satisfying  schema  constraints  (2.3) 

set  of  states  on  schema  s,  database 
instances  satisfying  all  schema 
constraints  (2.3) 

set  of  operations,  functions  on 
structure  sets,  fourth  component  of 
abstract  data  model  (2.3) 

arbitrary  schemas 

arbitrary  operation 

arbitrary  structures 

abstract  mapping  model  (2.3) 

set  of  structure  mappings  from  si  to 
s2,  first  component  of  abstract 
mappinq  model,  elements  functions 
Strl  (si)  — >Str2  (s2)  (2-4) 

arbitrary  structure  mapping 

set  of  operation  mappings,  second 
component  of  abstract  mapping  model, 
elements  functions 
Q 2  (s2)  xStr  1  ( si )  —>Q  1  (si )  *  (2.4) 

arbitrary  operation  mapping,  or 
arbitrary  tuple  (in  Part  III) 

formal  variables 

logical  'not' 

set  union,  relational  algebra  operator 
( 4 „  5)  ,  logical  1  or' 

set  intersection,  relational  algerra 
operator  (4.5),  logical  ‘and* 

'  if . . .  then„..  .  ' 

set  difference,  relational  algebra 
operator  (4.5) 

set  membership 

set  inclusion 

negation  of  set  membership 
null  set 

universal  quantifier,  * f or  all* 
exrential  quantifier,  'for  some' 


(V,G)  244 


Y 

provability  relation  (Ch-3) 

l  ii  nr 

truth  in  an  interpretation  (Ch.3) 

•is  defined*,  *if  and  only  if* 

EDm 

relational  data  model  (Ch-4,Ch.5) 

ESch 

set  of  relational  schemas  (4.  1) 

EStr  ( s) 

set  of  relational  structures  for 
schema  s  (h.  2) 

ESt  (s) 

set  of  relational  states  for  schema  s 
(4.4) 

Exp  (s ) 

set  of  relational  algebra  expressions 
on  schema  s  (4-5) 

e,e1, e2 

elements  of  Exp(s),  relational  algebra 
expressions 

e  (str ) , e (st) 

value  of  expression  e  in  structure  str 
or  state  st  (4-6) 

X— >Y  (PD) 

functional  dependency  (FD)  (4.  1,4.  3) 

X=Y  (DEQ) 

domain  eguality  (DEQ)  (4.1,4.  3) 

X=V  (VEQ) 

domain  X  constant  with  value  V  (VEQ) 

(4. 1,4.3) 

BMm (s 1 ,s2) 

set  of  relational  structure  mappings 
from  schema  si  to  schema  s2  (4.7) 

FDid (s) 

functional  dependency  identifiers  on 
schema  s,  trees  of  basic  FDs  (4.8) 

id , id  1 , id2 

elements  of  FDid(s) 

Edc 

function  to  eliminate  '’irrelevant" 
arcs  in  FD  identifiers  (4^.10) 

C1(S) 

closure  of  set  S  of  FDs  and  ECs,  all 
FDs  and  EQs  derivable  from  S  (4.11) 

Drv  (e) 

set  of  derivable  FDs  and  EQs  on 
expression  e  (4.12) 

C11  (S) 

closure  of  constraints  in  S  without 
augmentation  rule  (4.13) 

Drvl (e) 

FDs  and  EQs  derivable  on  e  without 
augmentation  rule  (4„ 13) 

id  (st  ;  x) 

the  value  of  FD  identifier  id  in  state 
st  with  valuation  x  (4.14) 

e  1=-e2 

equivalence  of  two  relational  algebra 
expressions  (5.  1) 

e1=e2  wrt  w 

el  contains  2  if  and  only  if  e  2  does 
(4.15) 

ESch* 

set  of  relational  schemas  including 
subset  constraints  (5.3) 

B[  X  ]  C  S[  Y  ]  (SS) 

subset  constraint  (SS)  (5.3) 

RDm* 

relational  data  model  with  subset 
constraints  (5.,  3) 

EMs* 

state  component  of  relational  mapping 
model  with  subset  constraints  (5.3) 

(V,G)  245 


Cl*  (S) 

Tup  (n ) 

BTup  (n) 
BQ(s) 

W<W  * 

#S 

##S 

EMq (s  1  ,s2) 

VTup  ( n) 
w/x/ 

Ts  (s) 


ts 

ts/x/ 

ts  (st ;  x) 

Str-Ins  (ts,e,w) 


Str-Del  ( ts,  e,  w) 

Med— Ins  (t s, e, w) 
Med— Del  (t s, e, w) 


closure  of  set  S  of  FDs,  EQs  and  SSCs, 
all  FDs,  EQs  and  SSCs  derivable  from 
elements  of  S  (5.4) 

set  of  n-tuples  of  positive  integers 

(6-1) 

set. of  n-tuples  whose  components  are 
positive  integers  or  blanks  (6.1) 

relational  operations  over  schema  s  of 
the  form  insert  (E,w)  and  delete  (E,w) 

(6.,  1) 


every  specified  coordinate  of  w  equals 
that  of  v*  (6.2) 

set  of  all  tuples  (with  blanks)  >  some 
member  of  S  (6.2) 

set  of  all  tuples  (without  blanks)  > 
some  member  of  S  (6.2) 

set  of  relational  operation  mappings, 
eleraer ts  finite  sets  of  associations 
op  (E ,  w  )  —  >o p  1  (E  1,  w  1 )  opn  (Rn  ,wn) 

(t>.4) 


set  of  n-tuples  whose  coordinates  are 
variables  v  i ,  v  2,  v3 , .  .  -  .  positive 
integers  0,1,2,...  or  blank  *— 1  (6.4) 

the  result  of  replacing  the  variables 
in  w  by  the  values  specified  in 
valuation  x  (6-4) 


se  t  of 
schema 


*3^  ll’ClilU  D  a 

op  1  (R  1  ,  w  1 

opi  (Hi , wi 


trans lation 
s,  elements 


sequences 

sequences 


over 


; - : opn (Rn, wn) , 

6  E$(s)  (6.4) 


whe  re 


arbitrary  translation  sequence  (6.4) 

arbitrary  translation  sequence  with 
each  variable  tuple  w  replaced  by  w/x/ 
(6-4) 


the  result  j[or  set  of  results  if 
nondeter ministic)  of  applying 
translation  sequence  ts/x/  to  state 
(or  structure)  st  (6.,4) 

translation  sequence  ts  has  the 
property  that  for  every  substitution  x 
of  variables  and  for  every  legal  state 
for  which  the  result  of  applying  ts[x] 
is  also  legal,  the  after-image  of 
expression  e  will  egual  the  before¬ 
image  of  e  with  exactly  w[ x  ]  added 
(6-5) 

as  above  except  that  the  after-image 
of  e  will  equal  the  before-image  or  e 
with  exactly  the  tuples  specified  by 
w[  x  ]  (the  set  ##w[xj)  removed  (6.5) 

as  above  except  that  the  after-image 
of  e  will  at  least  include  the  before¬ 
image  plus  e£x]  (5.5) 

as  above  except  that  the  after-image 
of  e  will  be  the  before-image  with  at 
least  the  tuples  specified  by  wi  x  1 
(the  set  ##w[x])  removed  (6.5) 

as  above  except  the  the  after-image  of 
e  will  at  least  contain  w[  x  ]  (6-5) 


Kk-In  s (ts,e , w) 


(V,G)  246 


Wk-Del  (ts , e, w) 
C v  (©) 

Ins (ts,e) 

Mb  (c,T) 

Del  (ts,e) 

Noins  (ts,  e) 
Nodel (ts, e) 

Iok (t s,  e) 

Dok  (ts,e) 


as  above  that  the  after-image  of  e 
will  not  contain  any  tuple  specified 
by  w[  x  ]  (tha  set  ##w£x]f  (6.5) 

set  of  variable  tuples  which  possess 
an  intrinsic  violation  of  a  valid 
constraint  of  e  (6..6) 

set  of  variable  tuples  derived  by 
finite  procedure  from  translation 
sequence  ts  and  expression  e  which 
will  always  be  inserted  into  e  by  ts 

(6.7) 

set  of  tuples  derived  from  T  in  which 
blanks  have  been  moved  according  to 
the  Meguivalencesn  introduced  by  2Q  c 

(6.8) 


set  of  tuples  derived  by  finite 
procedure  from  translation  sequence  ts 
and  expression  e  which  will  always  be 
deleted  from  e  by  ts  (6.9) 

predicate  which,  if  true,  means  that 
after-images  of  e  contain  no  new 
tuples  (7.  1) 

predicate  which,  if  true,  means  that 
after-images  of  e  contain  at  least  all 
the  old  tuples  (7. 1) 

predicate  which,  if  true,  means  that 
only  tuples  calculated  by  Ins  are  ever 
inserted  into  e  by  ts  (7.1) 

predicate  which,  if  true,  means  that 
only  tuples  calculated  by  Del  are  ever 
deleted  from  e  by  ts  (7.  i) 


UNIVERSITY  OF  TORONTO 


COMPUTER  SYSTEMS  RESEARCH  GROUP 
BIBLIOGRAPHY  OF  CSRG  TECHNICAL  REPO RTS+ 

*  CSRG-1  EMPIRICAL  COMPARISON  OF  LR(k)  AND  PRECEDENCE  PARSERS 

J.J.  Horning  and  W.R.  Lalonde,  September  1970 
[ACM  SIGPLAN  Notices,  November  1970] 

*  CSRG-2  AN  EFFICIENT  LALR  PARSER  GENERATOR 


W.R.  Lalonde,  February  1 97 1  [M.A.Sc.  Thesis,  EE  1971] 

*  CSRG-3 

A  PROCESSOR  GENERATOR  SYSTEM 

J.D.  Gorrie,  February  1971  [ M. A. Sc.  Thesis,  EE  1971] 

*  CSRG- 4 

DYLAN  USER'S  MANUAL 

P.E.  Bonzon,  March  1971 

CSRG-5 

DIAL  -  A  PROGRAMMING  SYSTEM  FOR  INTERACTIVE  ALGEEPAIC 
MANIPULATION 

Alan  C.M.  Brown  and  J.J.  Horning,  March  1971 

CSRG-6 

ON  DEADLOCK  IN  COMPUTER  SYSTEMS 

Richard  C.  Holt,  April  1971 

[Ph.D.  Thesis,  Dept,  of  Computer  Science, 

Cornell  University,  1971  ] 

CSRG- 7 

THE  STAR-RING  SYSTEM  OF  LOOSELY  COUPLED  DIGITAL  DEVICES 

John  Neill  Thomas  Potvin,  August  1971 
[M.A.Sc.  Thesis,  EE  1971] 

*  CSRG-8 

FILE  ORGANIZATION  AND  STRUCTURE 

G.M.  Stacey,  August  1971 

CSRG- 9 

DESIGN  STUDY  FOR  A  TWO-DIMENSIONAL  COMPUTER- ASS ISTE D 
ANIMATION  SYSTEM 

Kenneth  F.  Evans,  January  1972  [M.Sc.  Thesis,  DCS,  1972] 

*  CSRG-1Q  HOW  A  PROGRAMMING  LANGUAGE  IS  USED 

William  Gregg  Alexander,  February  1972 

[M.Sc.  Thesis,  DCS  1971;  Computer,  v.3,  n.11,  November  1975] 

*  CSRG-1 1  PROJECT  SUE  STATUS  REPORT 

J.W.  Atwood  (ed. ) ,  April  1972 

*  CSRG-12  THFEE  DIMENSIONAL  DATA  DISPLAY  WITH  HIDDEN  LINE  REMOVAL 

Rupert  Bramall,  April  1972  [M.Sc.  Thesis,  DCS,  1971] 

*  CSRG- 1 3  A  SYNTAX  DIRECTED  ERROR  RECOVERY  METHOD 

Lewis  R.  James,  May  1972  [M.Sc.  Thesis,  DCS,  1972] 

+  Abbreviations: 

DCS  -  Department  of  Computer  Science,  University  of  Toronto 
EE  -  Department  of  Electrical  Engineering,  University  of 
Toronto 

*  -  Out  of  print 


CSRG-14  THE  USE  OF  SERVICE  TIME  DISTRIBUTIONS  IN  SCHEDULING 
Kenneth  C.  Sevcik,  May  1972 

[Ph.D.  Thesis,  Committee  on  Information  Sciences, 
University  of  Chicago,  1971;  JACM,  January  1974] 

CSRG-15  PROCESS  STRUCTURING 

J. J.  Horning  and  B.  Randell,  June  1972 
[ACM  Computing  Surveys,  March  1973] 

CSRG-16  OPTIMAL  PROCESSOR  SCHEDULING  WHEN  SERVICE  TIMES  ARE 
HYPER EXPONENTIALLY  DISTRIBUTED  AND  PREEMTION  OVERHEAD 
IS  NOT  NEGLIGIBLE 
Kenneth  C.  Sevcik,  June  1972 

[Proceedings  of  the  Symposium  on  Computer-Communication, 
Networks  and  Teletraffic,  Polytechnic  Institute  of 
Brooklyn ,  1972] 

*  CSRG-17  PROGRAMMING  LANGUAGE  TRANSLATION  TECHNIQUES 

W.M.  McKeeman,  July  1972 

CSRG-18  A  COMPARATIVE  ANALYSIS  OF  SEVERAL  DISK  SCHEDULING 
ALGORITHMS 

C.J.M.  Turnbull,  September  1972 

CSBG-19  PROJECT  SUE  AS  A  LEARNING  EXPERIENCE 

K. C.  Sevcik  et  al,  September  1972 

[Proceedings  AFIPS  Fall  Joint  Computer  Conference, 
v.  41,  December  1972] 

*  CSRG-20  A  STUDY  OF  LANGUAGE  DIRECTED  COMPUTER  DESIGN 

David  B.  Wortman ,  December  1972 

[Ph.D.  Thesis,  Computer  Science  Department, 

Stanford  University,  1972] 

CSRG-21  AN  APL  TERMINAL  APPROACH  TO  COMPUTER  MAPPING 

R.  Kvaternik,  December  1972  [M.Sc.  Thesis,  DCS,  1972] 

*  CSRG-22  AN  IMPLEMENTATION  LANGUAGE  FOR  MINICOMPUTERS 

G.G.  Kalmar,  January  1973  [M.Sc.  Thesis,  DCS,  1972] 

CSRG-23  COMPILER  STRUCTURE 

W.M.  McKeeman,  January  1973 

[Proceedings  of  the  USA-Japan  Computer  Conference,  1972] 

*  CSRG-24  AN  ANNOTATED  BIBLIOGRAPHY  ON  COMPUTER  PROGRAM 

ENGINEERING 

J.D.  Gannon  (ed. ) ,  March  1973 


CSRG-25  THE  INVESTIGATION  OF  SERVICE  TIME  DISTRIBUTIONS 

Eleanor  A.  Lester,  April  1973  [M.Sc.  Thesis,  DCS,  1973] 

*  CSRG-26  PSYCHOLOGICAL  COMPLEXITY  CF  COMPUTER  PROGRAMS: 

AN  INITIAL  EXPERIMENT 
Larry  Weissman,  August  1973 

*  CSRG-27  STRUCTURED  SUESETS  OF  THE  PI/I  LANGUAGE 

Richard  C.  Holt  and  David  B.  Wortman,  October  1973 


*  CSRG-28  ON  THE  REDUCED  MATRIX  REPRESENTATION  OF  LR(k) 

PARSER  TABLES 

Marc  Louis  Joliat,  October  1973  [Ph.D.  Thesis,  EE  1973] 

*  CSRG-29  A  STUDENT  PROJECT  FCR  AN  OPERATING  SYSTEMS  COURSE 

B.  Czarnik  and  D.  Tsichritzis  (eds.)  ,  November  1973 

*  CSRG-30  A  PSEUDO-MACHINE  FCR  CODE  GENERATION 

Henry  John  Pasko,  December  1973  [M.Sc.  Thesis,  DCS  1973] 

*  CSRG-31  AN  ANNOTATED  BIELIOGRAPHY  ON  COMPUTER  PROGFAM 

ENGINEERING 

J.D.  Gannon  (ed. ) ,  Second  Edition,  March  1974 

*  CSRG-32  SCHEDULING  MULTIPLE  RESOURCE  COMPUTER  SYSTEMS 

E. D.  Lazowska,  May  1974  [M.Sc.  Thesis,  DCS,  1974] 

*  CSRG-33  AN  EDUCATIONAL  DATA  BASE  MANAGEMENT  SYSTEM 

F.  Lochovsky  and  D.  Tsichritzis,  May  1974  [INFOR, 
to  appear] 

*  CSRG-34  ALLOCATING  STORAGE  IN  HIERARCHICAL  DATA  BASES 

P.  Bernstein  and  D.  Tsichritzis,  May  1974  [Information 
Systems  Journal,  v. 1 ,  pp. 133-140] 

*  CSRG-35  ON  IMPLEMENTATION  OF  RELATIONS 

D.  Tsichritzis,  May  1974 

*  CSRG-36  SIX  PL/I  COMPILERS 

D. B.  Wortman,  P.J.  Khaiat,  and  D.M.  Lasker,  August  1974 
[Software  Practice  and  Experience,  v.b,  n. 3, 

July-Sept.  1976] 

*  CSRG-37  A  METHODOLOGY  FOR  STUDYING  THE  PSYCHOLOGICAL  COMPLEXITY 

OF  COMPUTER  PROGRAMS 

Laurence  M.  Weissman,  August  1974 

[Ph.D-  Thesis,  DCS,  1974] 

*  CSRG-38  AN  INVESTIGATION  OF  A  NEW  METHOD  OF  CONSTRUCTING 

SOFTWARE' 

David  M.  Lasker,  September  1974  [M.Sc.  Thesis,  DCS,  1974] 

CSRG-39  AN  ALGEBRAIC  MODEL  FOR  STRING  PATTERNS 

Glenn  F.  Stewart,  September  1974  [M.Sc.  Thesis,  DCS,  1974] 

*  CSRG-40  EDUCATIONAL  DATA  BASE  SYSTEM  USER'S  MANUAL 

J.  Klebanoff,  F.  Lochovsky,  A.  Fozitis,  and 
D.  Tsichritzis,  September  1974 

*  CSRG-41  NOTES  FROM  A  WORKSHOP  ON  THE  ATTAINMENT  OF 

RELIABLE  SOFTWARE 

David  B.  Wortman  (ed.),  September  1974 

*  CSRG-42  THE  PROJECT  SUE  SYSTEM  LANGUAGE  REFERENCE  MANUAL 

B.L.  Clark  and  F.J.B.  Ham,  September  1974 


9 


*  CSRG-43  A  DATA  BASE  PROCESSOR 

E.A.  Ozkarahan,  S.  A.  Schuster  and  K.C.  Smith 
November  1974  [Proceedings  National  Computer 
Conference  1975,  v.44,  pp. 379-388] 

*  CSRG-44  MATCHING  PROGRAM  AND  DATA  REPRESENTATION  TO  A 

COMPUTING  ENVIRONMENT 

Eric  C.R.  Hehner,  November  1974  [Ph.D.  Thesis,  DCS,  1974] 

*  CSRG-45  THREE  APPROACHES  TO  RELIABLE  SOFTWARE;  LANGUAGE 

DESIGN,  DYADIC  SPECIFICATION,  COMPLEMENTARY  SEMANTICS 
J.  E.  Donahue,  J. D.  Gannon,  J.V.  Guttag  and 
J.J.  Horning,  December  1974 

CSRG-46  THE  SYNTHESIS  OF  OPTIMAL  DECISION  TREES  FROM 
DECISION  TABLES 

Helmut  Schumacher,  December  1974 
[ M. Sc.  Thesis,  DCS,  1974] 

*  CSRG-47  LANGUAGE  DESIGN  TO  ENHANCE  PROGRAMMING  RELIABILITY 

John  D.  Gannon,  January  1975  [Ph.D.  Thesis,  DCS,  1975] 

*  CSRG-48  DETERMINISTIC  LEFT  TO  RIGHT  PARSING 

Christopher  J.M.  Turnbull,  January  1975 
[Ph.D.  Thesis,  EE,  1974] 

*  CSRG-49  A  NETWORK  FRAMEWORK  FOR.  RELATIONAL  IMPLEMENTATION 

D.  Tsichritzis,  February  1975  [in  Data  Base 
Description,  Dongue  and  Nijssen  (eds.).  North 
Holland  Publishing  Co. ] 

*  CSRG-50  A  UNIFIED  APPROACH  TO  FUNCTIONAL  DEPENDENCIES 

AND  RELATIONS 

P. A.  Bernstein,  J.R.  Swenson  and  D-C.  Tsichritzis 
February  1975  [Proceedings  of  the  ACM  SIGMOD  Conference, 
1975  ] 

*  CSRG-51  ZETA:  A  PROTOTYPE  RELATIONAL  DATA  BASE 

MANAGEMENT  SYSTEM 

M.  Brodie  (ed) .  February  1975  [Proceedings  Pacific 
ACM  Conference,  1975] 

CSRG-52  AUTOMATIC  GENERATION  OF  SYNTAX-REPAIRING  AND 
PARAGRAPHING  PARSERS 

David  T.  Barnard,  March  1975  [M.Sc.  Thesis,  DCS,  1975] 

*  CSRG-53  QUERY  EXECUTION  AND  INDEX  SELECTION  FOR  RELATIONAL 

DATA  BASES 

J. H.  Gilles  Farley  and  Stewart  A.  Schuster,  March  1975 

CSRG-54  AN  ANNOTATED  BIBLIOGRAPHY  ON  COMPUTER 
PROGRAM  ENGINEERING 

J.V.  Guttag  (ed. ) ,  Third  Edition,  April  1975 

CSRG-55  STRUCTURED  SUBSETS  OF  THE  PL/1  LANGUAGE 

Richard  C.  Holt  and  David  B.  Wortman,  May  1975 


*  CSRG-56  FEATURES  OF  A  CONCEPTUAL  SCHEMA 

D.  Tsichritzis,  June  1975  [Proceedings  Very  Large 
Data  Base  Conference,  1975] 

*  CSRG-57  MERLIN:  TOWARDS  AN  IDEAL  PROGRAMMING  LANGUAGE 

Eric  C.R.  Hehner,  July  1975 

CSRG-58  ON  THE  SEMANTICS  OF  THE  RELATIONAL  DATA  MODEL 
Hans  Albrecht  Schmid  and  J.  Richard  Swenson, 

July  1975  [Proceedings  of  the  ACM  SIGMOD  Conference, 
1975  ] 

*  CSRG-59  THE  SPECIFICATION  AND  APPIICATION  TO  PROGRAMMING 

OF  ABSTRACT  DATA  TYPES 

John  V.  Guttag,  September  1975  [Ph.D.  Thesis,  DCS,  1975 

*  CSRG-60  NORMALIZATION  AND  FUNCTIONAL  DEPENDENCIES  IN  THE 

RELATIONAL  DATA  BASE  MODEL 

Phillip  Alan  Bernstein,  October  1975 

[Ph.D.  Thesis,  DCS,  1975] 

*  CSRG-61  LSL :  A  LINK  AND  SELECTION  LANGUAGE 

D.  Tsichritzis,  November  1975  [Proceedings  ACM 
SIGMOD  Conference,  1976] 

*  CSRG-62  COMPLEMENTARY  DEFINITIONS  OF  PROGRAMMING 

LANGUAGE  SEMANTICS 

James  E.  Donahue,  November  1975 

[Ph.D.  Thesis,  DCS,  1975] 

CSRG-63  AN  EXPERIMENTAL  EVALUATION  OF  CHESS  PLAYING 
HEURISTICS 

Lazio  Sugar,  December  1975  [M.Sc.  Thesis,  DCS,  1975] 

CSRG-64  A  VIRTUAL  MEMORY  SYSTEM  FOR  A  RELATIONAL 
ASSOCIATIVE  PROCESSOR 

S.A.  Schuster,  E.A.  Ozkarahan,  and  K.C.  Smith, 

February  1976  [Proceedings  National  Computer 
Conference  1976,  v.45,  pp. 855-862] 

CSRG-65  PERFORMANCE  EVALUATION  OF  A  RELATIONAL 
ASSOCIATIVE  PROCESSOR 

E. A.  Ozkarahan,  S.A.  Schuster,  and  K.C.  Sevcik, 

February  1976  [ACM  Transactions  on  Database 
Systems,  v. 1 ,  n:4,  December  1976] 

CSRG-66  EDITING  COMPUTER  ANIMATED  FILM 
Michael  D.  Tilson,  February  1976 
[M.Sc.  Thesis,  DCS,  1975] 

CSRG-67  A  DIAGRAMMATIC  APPROACH  TO  PROGRAMMING  LANGUAGE 
SEMANTICS 

James  R.  Cordy,  March  1976  [M.Sc.  Thesis,  DCS,  1976] 

*  CSRG-68  A  SYNTHETIC  ENGLISH  QUERY  LANGUAGE  FOR  A 

RELATIONAL  ASSOCIATIVE  PROCESSOR 

L. Kerschber g ,  E.A.  Ozkarahan,  and  J.E.S.  Pacheco 
April  1976 


CSRG-69  AN  ANNOTATED  BIBLIOGRAPHY  ON  COMPUTER  PROGRAM  ENGINEERING 

D.  Barnard  and  D.  Thompson  (Eds-),  Fourth  Edition,  May  1976 

*  CSRG-70  A  TAXONOMY  OF  DATA  MODELS 

L.  Kerschberg,  A.  Klug,  and  D.  Tsichritzis,  May  1976 
[Proceedings  Very  Large  Data  Base  Conference,  1976] 

*  CSRG-71  OPTIMIZATION  FEATURES  FOR  THE  ARCHITECTURE  OF  A 

DATA  BASE  MACHINE 

E.  A.  Ozkarahan  and  K.C.  Sevcik,  May  1976 

*  CSRG-72  THE  RELATIONAL  DATA  BASE  SYSTEM  OMEGA  -  PROGRESS  REPORT 

H.  A.  Schmid  (ed.),  P.A.  Eemstein  (ed.),  B.  Arlow, 

R.  Baker  and  S.  Pozgaj,  July  1976 

CSRG-73  AN  ALGORITHMIC  APPROACH  TO  NORMALIZATION  OF 
RELATIONAL  DATA  BASE  SCHEMAS 
P.A.  Bernstein  and  C.  Beeri,  September  1976 

*  CSRG-74  A  HIGH-LEVEL  MACHINE-ORIENTED  ASSEMBLER  LANGUAGE 

FOR  A  DATA  BASE  MACHINE 

E. A.  Ozkarahan  and  S.A.  Schuster,  October  1976 

CSRG-75  CO  CONSIDERED  OD:  A  CONTRIBUTION  TO  THE 
PROGRAMMING  CALCULUS 
Eric  C.R.  Hehner,  November  1976 

CSRG-76  "SOFTWARE  HUT":  A  COMPUTER  PROGRAM  ENGINEERING 
PROJECT  IN  THE  FORM  OF  A  GAME 
J.J.  Horning  and  D.B.  Wortman,  November  1976 

CSRG-77  A  SHORT  STUDY  OF  PROGRAM  AND  MEMORY  POLICY  BEHAVIOUR 
G.  Scott  Graham,  January  1977 

CSRG-78  A  PANACHE  OF  DEMS  IDEAS 

D.  Tsichritzis,  February  1977 

CSRG-79  THE  DESIGN  AND  IMPLEMENTATION  OF  AN  ADVANCED  LA LR 
PARSE  TABLE  CONSTRUCTOR 

David  H.  Thompson,  April  1977  [M.Sc.  Thesis,  DCS,  1976] 

CSRG-80  AN  ANNOTATED  BIBLIOGRAPHY  ON  COMPUTER  PROGRAM  ENGINEERING 
D.  Barnard  (Ed.),  Fifth  Edition,  May  1977 

CSRG-81  PROGRAMMING  METHODOLOGY:  AN  ANNOTATED  BIBLIOGRAPHY  FOR 
IFIP  WORKING  GROUP  2.3 

Sol  J.  Greenspan  and  J.J.  Horning  (Eds.),  First  Edition, 

May  1977 

CSRG-82  NOTES  ON  EUCLID 

edited  by  W.  David  Elliot  and  David  T.  Barnard, 

August  1977 

CSRG-83  TOPICS  IN  QUEUEING  NETWORK  MODELING 
edited  by  G.  Scott  Graham,  July  1977 


CSRG-84  TOWARD  PROGRAM  ILLUSTRATION 

Edward  Yarwood,  September  1977  [M.Sc.  Thesis,  DCS,  1974] 


CSRG-85  CHARACTERIZING  SERVICE  TIME  AND  RESPONSE  TIME  DISTRIBUTIONS 
IN  QUEUEING  NETWORK  MODELS  OF  COMPUTER  SYSTEMS 
Edward  D.  Lazowska,  September  1977 
[Ph.D.  Thesis,  DCS,  1977] 

CSRG-86  MEASUREMENTS  OF  COMPUTER  SYSTEMS  FOR 
QUEUEING  NETWORK  MODELS 
Martin  G.  Kienzle,  October  1977 
[M.Sc.  Thesis,  DCS,  1977] 

CSRG-87  'OLGA'  LANGUAGE  REFERENCE  MANUAL 

B.  Abourbih,  H.  Trickey,  D.M.  Lewis,  E.S.  Lee, 

P.I.P.  Boulton,  November  1977 

CSRG-88  USING  A  GRAMMATICAL  FORMALISM  AS  A  PROGRAMMING  LANGUAGE 
Brad  A.  Silverberg,  January  1978 
[M.Sc.  Thesis,  DCS,  1978] 

CSRG-89  ON  THE  IMPLEMENTATION  OF  RELATIONS:  A  KEY  TO  EFFICIENCY 
Joachim  W.  Schmidt,  January  1978 

CSRG-90  DATA  BASE  MANAGEMENT  SYSTEM  USER  PERFORMANCE 
Frederick  H.  Lochovsky,  April  1978 
[Ph.D.  Thesis,  DCS,  1978] 

CSRG-91  SPECIFICATION  AND  VERIFICATION  OF  DATA  BASE 
SEMANTIC  INTEGRITY 

Michael  Lawrence  Brodie,  April  1978 
[Ph.D.  Thesis,  DCS,  1978] 

CSRG-92  "STRUCTURED  SOUND  SYNTHESIS  PROJECT  (SSSP) : 

AN  INTRODUCTION" 

by  William  Buxton,  Guy  Ferdorkow,  with 
Ronald  Baecker,  Gustav  Ciamaga,  Leslie  Mezei 
and  K.C.  Smith,  June  1978 

CSRG-93  "A  DEVICE-INDEPENDENT, GENERAL-PURPOSE  GRAPHICS 
SYSTEM  IN  A  MINICOMPUTER  TIME-SHARING 
ENVIRONMENT" 

William  T.  Reeves,  August  1978 
[M.Sc.  Thesis,  DCS,  1976] 

CSRG-94  "ON  THE  AXIOMATIC  VERIFICATION  OF  CONCURRENT  ALGORITHMS  " 
Christian  Lengauer,  August  1978 
[M.Sc.  Thesis,  DCS,  1978] 

CSRG-95  PISA:  "A  PROGRAMMING  SYSTEM  FOR  INTERACTIVE 
PRODUCTION  OF  APPLICATION  SOFTWARE" 

Rudolf  Marty,  August  1978 

CSRG-96  "ADAPTIVE  MICROPROGRAMMING  AND  PROCESSOR  MODELING" 

Walter  G.  Rosocha 
[  Ph..  D.  Thesis  ,  EE,  August  1  978] 

CSRG-97  DESIGN  ISSUES  IN  THE  FOUNDATION  OF  A  COMPUTER-BASED 
TOOL  FOR  MUSIC  COMPOSITION 
William  Buxton 

[M.Sc.  Thesis,  CSF.G,  October  1978  ] 


CSRG-98 


NGS 

”’S1 

•  ' 

' 

' 

' 

' 

"c*  si  Sec- T|S 


' 


■ 


Pmm 


’>  '  '{•  ■  ‘  ■'!  \  .  '  .  ■  ,-Vy  ,  V'K'7|i  >,m 

i;!"  ^  ;  ::  :  ■  ' ;■  f llilllllliiiii 

:  i!i  ■  '  #"  'IfflBi 

til' 

'll;  i' 1  1  ,'M‘J 

I ;  1 ,  ij(;  i;.:;W:V  '•[> mi 


"i  i 


.  t  i  u  \ 


ill  i , !•  i' i  )|i 


■■  ;;::i  i 'I  I  flilwi 

i;.  < 1  ,  .  <  !  K  j .!  Ur  i  vff  ■  i'v 

•  ■  ■  1 1  ’  '.i'liw  i  ujy  i 

;  v  %m-'\  \J,y';\c .flUfe •  m wP» 

9 

1  'c'  j  .  ,1  $  a W;#'  i 

>:li!  ‘y;, '  iipdfV 

*,  i  i  taiwii 


ili« 


'ft 


r  ;,j 


M  :  •  ■■■!'>  '!'<  hit  i  wlfcS 


i  l  l;) 1  ‘ W t '  (8  ill >  ftp 


